rhoa.targets module

Target Generation for Financial Time Series Analysis

This module provides tools for generating optimized binary classification targets based on future price movements in financial time series data. It supports two optimization strategies: Pareto multi-objective optimization and elbow method threshold selection.

Core Functionality

  • Generate 8 target combinations using different entry/exit price definitions

  • Automatic parameter optimization for lookback periods and thresholds

  • Support for both end-of-period and maximum-during-period gain calculations

  • Flexible class balance targeting or elbow-based threshold selection

Target Methods

The module generates eight distinct target definitions, each representing a different combination of entry price (Close[t] or High[t]) and exit price (end-of-period or maximum-during-period):

  1. Close[t+N] / Close[t] - Conservative entry, end-point exit

  2. Close[t+N] / High[t] - Aggressive entry, end-point exit

  3. High[t+N] / Close[t] - Conservative entry, high exit

  4. High[t+N] / High[t] - Aggressive entry, high exit

  5. max(Close[t+1:t+N]) / Close[t] - Conservative entry, optimal close exit

  6. max(Close[t+1:t+N]) / High[t] - Aggressive entry, optimal close exit

  7. max(High[t+1:t+N]) / Close[t] - Conservative entry, optimal high exit

  8. max(High[t+1:t+N]) / High[t] - Aggressive entry, optimal high exit

Optimization Modes

Auto Mode (Pareto Optimization)

Searches the space of (period, threshold) combinations to find Pareto-optimal solutions that balance three objectives: - Maximize threshold (higher precision requirements) - Minimize period (shorter holding time) - Minimize deviation from target class balance

Manual Mode (Elbow Method)

Uses a fixed lookback period and finds the optimal threshold using the elbow/knee point detection on the curve of instance counts vs. thresholds.

Mathematical Background

Pareto Optimization

A solution is Pareto-optimal if no other solution exists that improves at least one objective without worsening any other. For point A to dominate B: - A must be better than B in at least one objective - A must be no worse than B in all other objectives

Elbow Method

Identifies the point of maximum curvature on a convex decreasing curve by finding the point with maximum perpendicular distance from the line connecting curve endpoints.

Dependencies

  • pandas : DataFrame operations and time series handling

  • numpy : Numerical computations

  • kneed : Elbow/knee point detection (KneeLocator)

  • paretoset : Pareto frontier computation

Notes

Look-ahead Bias Warning:

These targets use future price information and should only be used for target creation in supervised learning. Never use future data for feature engineering or training, only for defining what the model should predict.

NaN Handling:

The last N rows (where N is the lookback period) will contain NaN values due to insufficient future data. These should be excluded from analysis.

Class Imbalance:

Very low target_class_balance (< 0.1) may result in insufficient positive examples. Very high values (> 0.9) may result in overly easy targets with poor discrimination.

Examples

Basic usage with auto mode:

>>> import pandas as pd
>>> import numpy as np
>>> from rhoa.targets import generate_target_combinations
>>>
>>> # Load your OHLC data
>>> df = pd.read_csv('prices.csv', index_col='Date', parse_dates=True)
>>>
>>> # Generate targets with 50% class balance
>>> targets, metadata = generate_target_combinations(
...     df, mode='auto', target_class_balance=0.5
... )
>>>
>>> print(f"Generated {len(targets.columns)} targets")
>>> print(f"Target_1 has {targets['Target_1'].sum()} positive instances")

Manual mode with fixed period:

>>> targets, metadata = generate_target_combinations(
...     df, mode='manual', lookback_periods=10
... )
>>> print(f"Method 1 threshold: {metadata['method_1']['threshold']}%")

See also

generate_target_combinations

Main function for target generation.

References

generate_target_combinations(df, mode='auto', lookback_periods=5, target_class_balance=0.5, min_period=1, max_period=20, period_step=1, min_pct=0, max_pct=100, step=1, close_col='Close', high_col='High')[source]

Generate eight target combinations with optimized thresholds and lookback periods.

This function creates binary classification targets based on future price movements, supporting two optimization modes: automatic Pareto-based optimization and manual elbow-based threshold selection. Each target represents a different way of measuring future price gains relative to current entry prices.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing OHLC (Open, High, Low, Close) price data. Must have at least the columns specified by close_col and high_col. Index should be a time series (e.g., DatetimeIndex).

  • mode ({'auto', 'manual'}, default='auto') –

    Optimization mode to use:

    • ’auto’ : Uses Pareto optimization to find optimal lookback period and threshold that balance multiple objectives (maximize threshold, minimize period, achieve target class balance).

    • ’manual’ : Uses fixed lookback period with elbow method to find optimal threshold based on the curve of instance counts vs. thresholds.

  • lookback_periods (int, default=5) – Number of periods to look forward for future price calculations. Only used when mode=’manual’. Must be >= 1 and < len(df).

  • target_class_balance (float, default=0.5) – Target proportion of positive class instances (range: 0.0 to 1.0). Only used when mode=’auto’. For example, 0.5 means aim for 50% positive instances, 0.3 means 30%.

  • min_period (int, default=1) – Minimum lookback period to consider in optimization search space. Only used when mode=’auto’. Must be >= 1.

  • max_period (int, default=20) – Maximum lookback period to consider in optimization search space. Only used when mode=’auto’. Must be > min_period and < len(df).

  • period_step (int, default=1) – Increment step for lookback period search. Only used when mode=’auto’. Must be >= 1.

  • min_pct (int, default=0) – Minimum threshold percentage to consider (e.g., 0 for 0%). Must be >= 0 and < max_pct.

  • max_pct (int, default=100) – Maximum threshold percentage to consider (e.g., 100 for 100%). Must be > min_pct and <= 100.

  • step (int, default=1) – Increment step for threshold search in percentage points. Must be >= 1.

  • close_col (str, default='Close') – Name of the close price column in the DataFrame.

  • high_col (str, default='High') – Name of the high price column in the DataFrame.

Returns:

  • targets_df (pd.DataFrame) – DataFrame with same index as input df, containing 8 boolean columns:

    • Target_1 : (Close[t+N] / Close[t]) - 1 >= threshold

    • Target_2 : (Close[t+N] / High[t]) - 1 >= threshold

    • Target_3 : (High[t+N] / Close[t]) - 1 >= threshold

    • Target_4 : (High[t+N] / High[t]) - 1 >= threshold

    • Target_5 : (max(Close[t+1:t+N]) / Close[t]) - 1 >= threshold

    • Target_6 : (max(Close[t+1:t+N]) / High[t]) - 1 >= threshold

    • Target_7 : (max(High[t+1:t+N]) / Close[t]) - 1 >= threshold

    • Target_8 : (max(High[t+1:t+N]) / High[t]) - 1 >= threshold

    Where N is the optimized lookback period, and threshold is the optimized percentage gain threshold.

  • metadata (dict) – Dictionary containing optimization results and configuration with keys:

    • ’mode’str

      The mode used (‘auto’ or ‘manual’).

    • ’method_1’ through ‘method_8’dict

      Each method dictionary contains:

      • ’period’int

        Optimal lookback period in number of time steps.

      • ’threshold’float

        Optimal threshold as percentage (e.g., 5.0 for 5%).

      • ’instances’int

        Number of positive instances at the optimal parameters.

      • ’pct_of_max’float

        Percentage of maximum possible instances (at 0% threshold).

Raises:
  • ValueError – If the DataFrame is empty.

  • ValueError – If close_col or high_col not found in DataFrame columns.

  • ValueError – If mode is not ‘auto’ or ‘manual’.

Return type:

Tuple[DataFrame, Dict]

See also

_find_optimal_params_pareto

Pareto optimization for auto mode.

_find_optimal_params_elbow

Elbow method for manual mode.

_generate_targets

Generates target columns from optimal parameters.

Notes

Target Interpretation:

  • Targets 1-4 measure end-of-period gains (single point in time at t+N).

  • Targets 5-8 measure maximum gains during the period (any time in [t+1, t+N]).

  • Using High[t] as denominator (Targets 2, 4, 6, 8) is more conservative than Close[t], as it requires overcoming intraday peaks.

  • Maximum-based targets (5-8) capture exit opportunities that might occur before the end of the lookback period.

Pareto Optimization (Auto Mode):

The Pareto optimization finds solutions that are not dominated by any other solution in the objective space. A solution A dominates solution B if:

  • A is better than B in at least one objective

  • A is no worse than B in all other objectives

For this problem, we optimize three objectives:

  1. Maximize threshold (prefer higher gain requirements)

  2. Minimize period (prefer shorter holding periods)

  3. Minimize deviation from target class balance

From the Pareto-optimal set, we select the solution closest to the target class balance.

Elbow Method (Manual Mode):

The elbow method finds the “knee” or “elbow” point on the curve of instance counts vs. thresholds. This point represents the optimal trade-off where:

  • Increasing threshold further causes steep drops in instances (high cost)

  • Decreasing threshold provides diminishing returns in instances

Mathematically, the elbow is found by maximizing the distance from the curve to the line connecting the endpoints.

Common Pitfalls:

  • Insufficient data: Ensure df has enough rows for the lookback period.

  • Look-ahead bias: Do not use future data for training; only for target creation.

  • Class imbalance: Very low or very high target_class_balance may yield poor results.

  • NaN handling: Last N rows will have NaN targets due to insufficient future data.

Examples

Example 1: Auto mode with default parameters

>>> import pandas as pd
>>> import numpy as np
>>> from rhoa.targets import generate_target_combinations
>>>
>>> # Create sample OHLC data
>>> np.random.seed(42)
>>> dates = pd.date_range('2020-01-01', periods=100, freq='D')
>>> df = pd.DataFrame({
...     'Close': 100 + np.cumsum(np.random.randn(100)),
...     'High': 100 + np.cumsum(np.random.randn(100)) + 1
... }, index=dates)
>>>
>>> # Generate targets with auto mode
>>> targets, meta = generate_target_combinations(df, mode='auto')
>>>
>>> # Check results
>>> print(f"Mode: {meta['mode']}")
Mode: auto
>>> print(f"Target_7 period: {meta['method_7']['period']}")
Target_7 period: 6
>>> print(f"Target_7 threshold: {meta['method_7']['threshold']}%")
Target_7 threshold: 4.0%
>>> print(f"Positive instances: {targets['Target_7'].sum()}")
Positive instances: 249

Example 2: Manual mode with custom lookback period

>>> # Generate targets with manual mode
>>> targets, meta = generate_target_combinations(
...     df,
...     mode='manual',
...     lookback_periods=10,
...     min_pct=0,
...     max_pct=20,
...     step=1
... )
>>>
>>> # Check results for method 1
>>> method_1 = meta['method_1']
>>> print(f"Period: {method_1['period']}, Threshold: {method_1['threshold']}%")
Period: 10, Threshold: 6.0%
>>> print(f"Instances: {method_1['instances']} ({method_1['pct_of_max']:.1f}% of max)")
Instances: 22 (1.4% of max)

Example 3: Target specific class balance

>>> # Aim for 30% positive instances
>>> targets, meta = generate_target_combinations(
...     df,
...     mode='auto',
...     target_class_balance=0.3,
...     min_period=1,
...     max_period=15
... )
>>>
>>> # Verify class balance for each target
>>> for i in range(1, 9):
...     positive_pct = targets[f'Target_{i}'].sum() / len(targets) * 100
...     print(f"Target_{i}: {positive_pct:.1f}% positive")
Target_1: 29.5% positive
Target_2: 30.2% positive
...

Example 4: Custom column names

>>> # DataFrame with different column names
>>> df_custom = df.rename(columns={'Close': 'close_price', 'High': 'high_price'})
>>> targets, meta = generate_target_combinations(
...     df_custom,
...     mode='auto',
...     close_col='close_price',
...     high_col='high_price'
... )
>>> print(targets.columns.tolist())
['Target_1', 'Target_2', 'Target_3', 'Target_4', 'Target_5', 'Target_6', 'Target_7', 'Target_8']

References

The targets module provides optimized binary target generation for machine learning in financial time series.

Function Reference

Target Generation

generate_target_combinations(df, mode='auto', lookback_periods=5, target_class_balance=0.5, min_period=1, max_period=20, period_step=1, min_pct=0, max_pct=100, step=1, close_col='Close', high_col='High')[source]

Generate eight target combinations with optimized thresholds and lookback periods.

This function creates binary classification targets based on future price movements, supporting two optimization modes: automatic Pareto-based optimization and manual elbow-based threshold selection. Each target represents a different way of measuring future price gains relative to current entry prices.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing OHLC (Open, High, Low, Close) price data. Must have at least the columns specified by close_col and high_col. Index should be a time series (e.g., DatetimeIndex).

  • mode ({'auto', 'manual'}, default='auto') –

    Optimization mode to use:

    • ’auto’ : Uses Pareto optimization to find optimal lookback period and threshold that balance multiple objectives (maximize threshold, minimize period, achieve target class balance).

    • ’manual’ : Uses fixed lookback period with elbow method to find optimal threshold based on the curve of instance counts vs. thresholds.

  • lookback_periods (int, default=5) – Number of periods to look forward for future price calculations. Only used when mode=’manual’. Must be >= 1 and < len(df).

  • target_class_balance (float, default=0.5) – Target proportion of positive class instances (range: 0.0 to 1.0). Only used when mode=’auto’. For example, 0.5 means aim for 50% positive instances, 0.3 means 30%.

  • min_period (int, default=1) – Minimum lookback period to consider in optimization search space. Only used when mode=’auto’. Must be >= 1.

  • max_period (int, default=20) – Maximum lookback period to consider in optimization search space. Only used when mode=’auto’. Must be > min_period and < len(df).

  • period_step (int, default=1) – Increment step for lookback period search. Only used when mode=’auto’. Must be >= 1.

  • min_pct (int, default=0) – Minimum threshold percentage to consider (e.g., 0 for 0%). Must be >= 0 and < max_pct.

  • max_pct (int, default=100) – Maximum threshold percentage to consider (e.g., 100 for 100%). Must be > min_pct and <= 100.

  • step (int, default=1) – Increment step for threshold search in percentage points. Must be >= 1.

  • close_col (str, default='Close') – Name of the close price column in the DataFrame.

  • high_col (str, default='High') – Name of the high price column in the DataFrame.

Returns:

  • targets_df (pd.DataFrame) – DataFrame with same index as input df, containing 8 boolean columns:

    • Target_1 : (Close[t+N] / Close[t]) - 1 >= threshold

    • Target_2 : (Close[t+N] / High[t]) - 1 >= threshold

    • Target_3 : (High[t+N] / Close[t]) - 1 >= threshold

    • Target_4 : (High[t+N] / High[t]) - 1 >= threshold

    • Target_5 : (max(Close[t+1:t+N]) / Close[t]) - 1 >= threshold

    • Target_6 : (max(Close[t+1:t+N]) / High[t]) - 1 >= threshold

    • Target_7 : (max(High[t+1:t+N]) / Close[t]) - 1 >= threshold

    • Target_8 : (max(High[t+1:t+N]) / High[t]) - 1 >= threshold

    Where N is the optimized lookback period, and threshold is the optimized percentage gain threshold.

  • metadata (dict) – Dictionary containing optimization results and configuration with keys:

    • ’mode’str

      The mode used (‘auto’ or ‘manual’).

    • ’method_1’ through ‘method_8’dict

      Each method dictionary contains:

      • ’period’int

        Optimal lookback period in number of time steps.

      • ’threshold’float

        Optimal threshold as percentage (e.g., 5.0 for 5%).

      • ’instances’int

        Number of positive instances at the optimal parameters.

      • ’pct_of_max’float

        Percentage of maximum possible instances (at 0% threshold).

Raises:
  • ValueError – If the DataFrame is empty.

  • ValueError – If close_col or high_col not found in DataFrame columns.

  • ValueError – If mode is not ‘auto’ or ‘manual’.

Return type:

Tuple[DataFrame, Dict]

See also

_find_optimal_params_pareto

Pareto optimization for auto mode.

_find_optimal_params_elbow

Elbow method for manual mode.

_generate_targets

Generates target columns from optimal parameters.

Notes

Target Interpretation:

  • Targets 1-4 measure end-of-period gains (single point in time at t+N).

  • Targets 5-8 measure maximum gains during the period (any time in [t+1, t+N]).

  • Using High[t] as denominator (Targets 2, 4, 6, 8) is more conservative than Close[t], as it requires overcoming intraday peaks.

  • Maximum-based targets (5-8) capture exit opportunities that might occur before the end of the lookback period.

Pareto Optimization (Auto Mode):

The Pareto optimization finds solutions that are not dominated by any other solution in the objective space. A solution A dominates solution B if:

  • A is better than B in at least one objective

  • A is no worse than B in all other objectives

For this problem, we optimize three objectives:

  1. Maximize threshold (prefer higher gain requirements)

  2. Minimize period (prefer shorter holding periods)

  3. Minimize deviation from target class balance

From the Pareto-optimal set, we select the solution closest to the target class balance.

Elbow Method (Manual Mode):

The elbow method finds the “knee” or “elbow” point on the curve of instance counts vs. thresholds. This point represents the optimal trade-off where:

  • Increasing threshold further causes steep drops in instances (high cost)

  • Decreasing threshold provides diminishing returns in instances

Mathematically, the elbow is found by maximizing the distance from the curve to the line connecting the endpoints.

Common Pitfalls:

  • Insufficient data: Ensure df has enough rows for the lookback period.

  • Look-ahead bias: Do not use future data for training; only for target creation.

  • Class imbalance: Very low or very high target_class_balance may yield poor results.

  • NaN handling: Last N rows will have NaN targets due to insufficient future data.

Examples

Example 1: Auto mode with default parameters

>>> import pandas as pd
>>> import numpy as np
>>> from rhoa.targets import generate_target_combinations
>>>
>>> # Create sample OHLC data
>>> np.random.seed(42)
>>> dates = pd.date_range('2020-01-01', periods=100, freq='D')
>>> df = pd.DataFrame({
...     'Close': 100 + np.cumsum(np.random.randn(100)),
...     'High': 100 + np.cumsum(np.random.randn(100)) + 1
... }, index=dates)
>>>
>>> # Generate targets with auto mode
>>> targets, meta = generate_target_combinations(df, mode='auto')
>>>
>>> # Check results
>>> print(f"Mode: {meta['mode']}")
Mode: auto
>>> print(f"Target_7 period: {meta['method_7']['period']}")
Target_7 period: 6
>>> print(f"Target_7 threshold: {meta['method_7']['threshold']}%")
Target_7 threshold: 4.0%
>>> print(f"Positive instances: {targets['Target_7'].sum()}")
Positive instances: 249

Example 2: Manual mode with custom lookback period

>>> # Generate targets with manual mode
>>> targets, meta = generate_target_combinations(
...     df,
...     mode='manual',
...     lookback_periods=10,
...     min_pct=0,
...     max_pct=20,
...     step=1
... )
>>>
>>> # Check results for method 1
>>> method_1 = meta['method_1']
>>> print(f"Period: {method_1['period']}, Threshold: {method_1['threshold']}%")
Period: 10, Threshold: 6.0%
>>> print(f"Instances: {method_1['instances']} ({method_1['pct_of_max']:.1f}% of max)")
Instances: 22 (1.4% of max)

Example 3: Target specific class balance

>>> # Aim for 30% positive instances
>>> targets, meta = generate_target_combinations(
...     df,
...     mode='auto',
...     target_class_balance=0.3,
...     min_period=1,
...     max_period=15
... )
>>>
>>> # Verify class balance for each target
>>> for i in range(1, 9):
...     positive_pct = targets[f'Target_{i}'].sum() / len(targets) * 100
...     print(f"Target_{i}: {positive_pct:.1f}% positive")
Target_1: 29.5% positive
Target_2: 30.2% positive
...

Example 4: Custom column names

>>> # DataFrame with different column names
>>> df_custom = df.rename(columns={'Close': 'close_price', 'High': 'high_price'})
>>> targets, meta = generate_target_combinations(
...     df_custom,
...     mode='auto',
...     close_col='close_price',
...     high_col='high_price'
... )
>>> print(targets.columns.tolist())
['Target_1', 'Target_2', 'Target_3', 'Target_4', 'Target_5', 'Target_6', 'Target_7', 'Target_8']

References