Target Generation for Financial Time Series Analysis
This module provides tools for generating optimized binary classification targets
based on future price movements in financial time series data. It supports two
optimization strategies: Pareto multi-objective optimization and elbow method
threshold selection.
The module generates eight distinct target definitions, each representing a
different combination of entry price (Close[t] or High[t]) and exit price
(end-of-period or maximum-during-period):
Searches the space of (period, threshold) combinations to find Pareto-optimal
solutions that balance three objectives:
- Maximize threshold (higher precision requirements)
- Minimize period (shorter holding time)
- Minimize deviation from target class balance
Manual Mode (Elbow Method)
Uses a fixed lookback period and finds the optimal threshold using the
elbow/knee point detection on the curve of instance counts vs. thresholds.
A solution is Pareto-optimal if no other solution exists that improves at
least one objective without worsening any other. For point A to dominate B:
- A must be better than B in at least one objective
- A must be no worse than B in all other objectives
Elbow Method
Identifies the point of maximum curvature on a convex decreasing curve by
finding the point with maximum perpendicular distance from the line
connecting curve endpoints.
pandas : DataFrame operations and time series handling
numpy : Numerical computations
kneed : Elbow/knee point detection (KneeLocator)
paretoset : Pareto frontier computation
Notes
Look-ahead Bias Warning:
These targets use future price information and should only be used for
target creation in supervised learning. Never use future data for feature
engineering or training, only for defining what the model should predict.
NaN Handling:
The last N rows (where N is the lookback period) will contain NaN values
due to insufficient future data. These should be excluded from analysis.
Class Imbalance:
Very low target_class_balance (< 0.1) may result in insufficient positive
examples. Very high values (> 0.9) may result in overly easy targets with
poor discrimination.
Examples
Basic usage with auto mode:
>>> importpandasaspd>>> importnumpyasnp>>> fromrhoa.targetsimportgenerate_target_combinations>>>>>> # Load your OHLC data>>> df=pd.read_csv('prices.csv',index_col='Date',parse_dates=True)>>>>>> # Generate targets with 50% class balance>>> targets,metadata=generate_target_combinations(... df,mode='auto',target_class_balance=0.5... )>>>>>> print(f"Generated {len(targets.columns)} targets")>>> print(f"Target_1 has {targets['Target_1'].sum()} positive instances")
Generate eight target combinations with optimized thresholds and lookback periods.
This function creates binary classification targets based on future price movements,
supporting two optimization modes: automatic Pareto-based optimization and manual
elbow-based threshold selection. Each target represents a different way of measuring
future price gains relative to current entry prices.
Parameters:
df (pd.DataFrame) – Input DataFrame containing OHLC (Open, High, Low, Close) price data.
Must have at least the columns specified by close_col and high_col.
Index should be a time series (e.g., DatetimeIndex).
mode ({'auto', 'manual'}, default='auto') –
Optimization mode to use:
’auto’ : Uses Pareto optimization to find optimal lookback period and
threshold that balance multiple objectives (maximize threshold, minimize
period, achieve target class balance).
’manual’ : Uses fixed lookback period with elbow method to find optimal
threshold based on the curve of instance counts vs. thresholds.
lookback_periods (int, default=5) – Number of periods to look forward for future price calculations.
Only used when mode=’manual’.
Must be >= 1 and < len(df).
target_class_balance (float, default=0.5) – Target proportion of positive class instances (range: 0.0 to 1.0).
Only used when mode=’auto’.
For example, 0.5 means aim for 50% positive instances, 0.3 means 30%.
min_period (int, default=1) – Minimum lookback period to consider in optimization search space.
Only used when mode=’auto’.
Must be >= 1.
max_period (int, default=20) – Maximum lookback period to consider in optimization search space.
Only used when mode=’auto’.
Must be > min_period and < len(df).
period_step (int, default=1) – Increment step for lookback period search.
Only used when mode=’auto’.
Must be >= 1.
min_pct (int, default=0) – Minimum threshold percentage to consider (e.g., 0 for 0%).
Must be >= 0 and < max_pct.
max_pct (int, default=100) – Maximum threshold percentage to consider (e.g., 100 for 100%).
Must be > min_pct and <= 100.
step (int, default=1) – Increment step for threshold search in percentage points.
Must be >= 1.
close_col (str, default='Close') – Name of the close price column in the DataFrame.
high_col (str, default='High') – Name of the high price column in the DataFrame.
Returns:
targets_df (pd.DataFrame) – DataFrame with same index as input df, containing 8 boolean columns: