Target Generation

Learn how to generate optimized binary classification targets for machine learning models.

Overview

Rhoa’s target generation creates 8 different binary target types based on future price movements:

Target_1: Close[N]/Close[0] - Future close vs current close
Target_2: Close[N]/High[0] - Future close vs current high
Target_3: High[N]/Close[0] - Future high vs current close
Target_4: High[N]/High[0] - Future high vs current high
Target_5: MaxClose/Close[0] - Max future close vs current close
Target_6: MaxClose/High[0] - Max future close vs current high
Target_7: MaxHigh/Close[0] - Max future high vs current close
Target_8: MaxHigh/High[0] - Max future high vs current high

Two modes are available: - Auto mode: Pareto optimization finds optimal lookback period AND threshold - Manual mode: Fixed lookback period with elbow method for thresholds

Auto Mode (Pareto Optimization)

Auto mode automatically finds the best combination of lookback period and threshold.

Basic Auto Mode

import pandas as pd
from rhoa.targets import generate_target_combinations

# Load OHLC data
df = pd.read_csv('prices.csv')

# Generate targets with auto mode
targets, metadata = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.5  # Aim for 50% positive instances
)

# Inspect the targets
print(targets.head())
print(f"\nShape: {targets.shape}")
print(f"\nColumns: {targets.columns.tolist()}")

Understanding Metadata

The metadata dictionary contains parameters for each target:

# Check what parameters were found optimal
for method, params in metadata.items():
    print(f"{method}:")
    print(f"  Period: {params['period']} days")
    print(f"  Threshold: {params['threshold']}%")
    print(f"  Positive instances: {params['instances']}")
    print(f"  % of maximum: {params['pct_of_max']:.1f}%")
    print()

Example output:

method_1:
  Period: 5 days
  Threshold: 3.5%
  Positive instances: 142
  % of maximum: 12.3%

method_7:
  Period: 6 days
  Threshold: 4.0%
  Positive instances: 249
  % of maximum: 21.5%

Custom Class Balance

Adjust the target class balance based on your needs:

# Conservative: 30% positive instances (higher precision)
targets_conservative, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.3
)

# Aggressive: 70% positive instances (higher recall)
targets_aggressive, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.7
)

# Balanced: 50% positive instances
targets_balanced, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.5
)

Custom Search Ranges

Control the optimization search space:

# Search only short-term periods (1-5 days)
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    period_range=(1, 5),
    threshold_range=(1.0, 5.0),
    target_class_balance=0.5
)

# Search longer-term periods (10-30 days)
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    period_range=(10, 30),
    threshold_range=(5.0, 20.0),
    target_class_balance=0.4
)

Manual Mode (Elbow Method)

Manual mode uses a fixed lookback period and finds optimal thresholds using the elbow method.

Basic Manual Mode

# Generate targets with fixed 5-day lookback
targets, metadata = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=5
)

# Check the detected thresholds
for method, params in metadata.items():
    print(f"{method}: threshold={params['threshold']}%, "
          f"instances={params['instances']}")

Different Lookback Periods

# Short-term: 3 days
targets_short, meta_short = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=3
)

# Medium-term: 10 days
targets_medium, meta_medium = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=10
)

# Long-term: 20 days
targets_long, meta_long = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=20
)

# Compare signal counts
print(f"Short-term (3d): {targets_short['Target_7'].sum()} signals")
print(f"Medium-term (10d): {targets_medium['Target_7'].sum()} signals")
print(f"Long-term (20d): {targets_long['Target_7'].sum()} signals")

Choosing the Right Target

Different targets serve different trading strategies.

Target Characteristics

targets, meta = generate_target_combinations(df, mode='auto')

# Analyze each target
for i in range(1, 9):
    target_col = f'Target_{i}'
    positive_pct = targets[target_col].mean() * 100

    print(f"{target_col}:")
    print(f"  Method: {meta[f'method_{i}']}")
    print(f"  Positive: {positive_pct:.1f}%")
    print(f"  Negative: {100-positive_pct:.1f}%")
    print()

Compare Target Types

# Target 1: Conservative (Close[N]/Close[0])
# Requires price to be higher at specific future date

# Target 7: Moderate (MaxHigh/Close[0])
# Requires max high within period to exceed threshold
# More achievable than Target 1

# Target 8: Aggressive (MaxHigh/High[0])
# Most stringent - requires exceeding current high

# Count positives for each
for i in [1, 7, 8]:
    count = targets[f'Target_{i}'].sum()
    pct = targets[f'Target_{i}'].mean() * 100
    print(f"Target {i}: {count} positives ({pct:.1f}%)")

Recommended Targets by Strategy

"""
Swing Trading (hold 3-10 days):
  - Target_7 (MaxHigh/Close[0])
  - Captures upward swings
  - Moderate signal frequency

Day Trading (hold intraday):
  - Target_1 (Close[N]/Close[0])
  - Quick entries and exits
  - Higher frequency

Position Trading (hold weeks/months):
  - Target_5 (MaxClose/Close[0])
  - Sustained movements
  - Lower frequency, higher conviction

Mean Reversion:
  - Target_1 with lower thresholds (1-3%)
  - Quick profit taking

Momentum/Breakout:
  - Target_7/8 with higher thresholds (>5%)
  - Captures strong moves
"""

Validating Targets

Always validate target quality before using in production.

Class Balance Check

targets, meta = generate_target_combinations(df, mode='auto')

# Check class balance for all targets
print("Class Distribution:")
print("-" * 40)
for col in targets.columns:
    pos = targets[col].sum()
    neg = len(targets) - pos
    ratio = pos / len(targets) * 100
    print(f"{col}: {pos} positive ({ratio:.1f}%), {neg} negative")

Temporal Validation

Check if targets are distributed across time:

import matplotlib.pyplot as plt

targets, meta = generate_target_combinations(df, mode='auto')

# Check Target_7 distribution over time
# Assuming df has Date column
df['Target_7'] = targets['Target_7']

# Group by month
df['Month'] = pd.to_datetime(df['Date']).dt.to_period('M')
monthly = df.groupby('Month')['Target_7'].mean()

print("Monthly positive rate:")
print(monthly)

# Should not have months with 0% or 100% positive rate

Forward-Looking Validation

Never validate on data used to generate targets:

# WRONG: Using same data for generation and validation
targets, meta = generate_target_combinations(df, mode='auto')
X_train, X_test, y_train, y_test = train_test_split(X, targets['Target_7'])
# This will overfit!

# CORRECT: Time-based split
split_date = '2024-01-01'

# Generate targets on training data only
train_df = df[df['Date'] < split_date]
targets_train, meta = generate_target_combinations(train_df, mode='auto')

# Apply same parameters to test data
test_df = df[df['Date'] >= split_date]
# Use meta['method_7']['period'] and meta['method_7']['threshold']
# to generate test targets with same parameters

Practical Examples

Example 1: Multi-Timeframe Targets

# Generate targets for different timeframes
df = pd.read_csv('prices.csv')

# Short-term (3 days)
targets_3d, meta_3d = generate_target_combinations(
    df, mode='manual', lookback_periods=3
)
df['Target_3d'] = targets_3d['Target_7']

# Medium-term (10 days)
targets_10d, meta_10d = generate_target_combinations(
    df, mode='manual', lookback_periods=10
)
df['Target_10d'] = targets_10d['Target_7']

# Use both targets for hierarchical prediction
# First predict 3d, then 10d

Example 2: Target Ensemble

# Use multiple targets for ensemble approach
targets, meta = generate_target_combinations(df, mode='auto')

# Create ensemble target: positive if ANY of multiple targets true
ensemble = (
    targets['Target_5'] |
    targets['Target_6'] |
    targets['Target_7']
).astype(int)

print(f"Ensemble positive rate: {ensemble.mean():.1%}")

Example 3: Conservative Target

# Create very conservative target for high-precision trading
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.1,  # Only 10% positive
    threshold_range=(10.0, 30.0)  # High thresholds
)

# This should give fewer but higher-quality signals
print(f"Positive signals: {targets['Target_7'].sum()}")
print(f"Average threshold: {meta['method_7']['threshold']}%")

Saving and Loading Metadata

Always save target generation parameters for reproducibility.

Save Metadata

import json

targets, meta = generate_target_combinations(df, mode='auto')

# Save metadata
with open('target_metadata.json', 'w') as f:
    json.dump(meta, f, indent=2)

# Save targets
targets.to_csv('targets.csv', index=False)

Load and Apply

import json

# Load metadata
with open('target_metadata.json', 'r') as f:
    meta = json.load(f)

# Apply to new data using same parameters
# (You'll need to implement the logic using meta parameters)
period = meta['method_7']['period']
threshold = meta['method_7']['threshold']

print(f"Using period={period}, threshold={threshold}%")

Best Practices

Always validate targets on out-of-sample data
Save metadata for reproducibility
Check class balance - extreme imbalance causes issues
Use appropriate timeframes - match your trading style
Start with auto mode - it finds good defaults
Test multiple targets - different targets for different strategies
Consider transaction costs - adjust thresholds accordingly
Avoid data leakage - never peek at future data during training

Next Steps

Continue to Complete ML Pipeline to see how to build end-to-end ML pipelines using generated targets.