Target Generation

Learn how to generate optimized binary classification targets for machine learning models.

Overview

Rhoa’s target generation creates 8 different binary target types based on future price movements:

  1. Target_1: Close[N]/Close[0] - Future close vs current close

  2. Target_2: Close[N]/High[0] - Future close vs current high

  3. Target_3: High[N]/Close[0] - Future high vs current close

  4. Target_4: High[N]/High[0] - Future high vs current high

  5. Target_5: MaxClose/Close[0] - Max future close vs current close

  6. Target_6: MaxClose/High[0] - Max future close vs current high

  7. Target_7: MaxHigh/Close[0] - Max future high vs current close

  8. Target_8: MaxHigh/High[0] - Max future high vs current high

Two modes are available: - Auto mode: Pareto optimization finds optimal lookback period AND threshold - Manual mode: Fixed lookback period with elbow method for thresholds

Auto Mode (Pareto Optimization)

Auto mode automatically finds the best combination of lookback period and threshold.

Basic Auto Mode

import pandas as pd
from rhoa.targets import generate_target_combinations

# Load OHLC data
df = pd.read_csv('prices.csv')

# Generate targets with auto mode
targets, metadata = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.5  # Aim for 50% positive instances
)

# Inspect the targets
print(targets.head())
print(f"\nShape: {targets.shape}")
print(f"\nColumns: {targets.columns.tolist()}")

Understanding Metadata

The metadata dictionary contains parameters for each target:

# Check what parameters were found optimal
for method, params in metadata.items():
    print(f"{method}:")
    print(f"  Period: {params['period']} days")
    print(f"  Threshold: {params['threshold']}%")
    print(f"  Positive instances: {params['instances']}")
    print(f"  % of maximum: {params['pct_of_max']:.1f}%")
    print()

Example output:

method_1:
  Period: 5 days
  Threshold: 3.5%
  Positive instances: 142
  % of maximum: 12.3%

method_7:
  Period: 6 days
  Threshold: 4.0%
  Positive instances: 249
  % of maximum: 21.5%

Custom Class Balance

Adjust the target class balance based on your needs:

# Conservative: 30% positive instances (higher precision)
targets_conservative, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.3
)

# Aggressive: 70% positive instances (higher recall)
targets_aggressive, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.7
)

# Balanced: 50% positive instances
targets_balanced, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.5
)

Custom Search Ranges

Control the optimization search space:

# Search only short-term periods (1-5 days)
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    period_range=(1, 5),
    threshold_range=(1.0, 5.0),
    target_class_balance=0.5
)

# Search longer-term periods (10-30 days)
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    period_range=(10, 30),
    threshold_range=(5.0, 20.0),
    target_class_balance=0.4
)

Manual Mode (Elbow Method)

Manual mode uses a fixed lookback period and finds optimal thresholds using the elbow method.

Basic Manual Mode

# Generate targets with fixed 5-day lookback
targets, metadata = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=5
)

# Check the detected thresholds
for method, params in metadata.items():
    print(f"{method}: threshold={params['threshold']}%, "
          f"instances={params['instances']}")

Different Lookback Periods

# Short-term: 3 days
targets_short, meta_short = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=3
)

# Medium-term: 10 days
targets_medium, meta_medium = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=10
)

# Long-term: 20 days
targets_long, meta_long = generate_target_combinations(
    df,
    mode='manual',
    lookback_periods=20
)

# Compare signal counts
print(f"Short-term (3d): {targets_short['Target_7'].sum()} signals")
print(f"Medium-term (10d): {targets_medium['Target_7'].sum()} signals")
print(f"Long-term (20d): {targets_long['Target_7'].sum()} signals")

Choosing the Right Target

Different targets serve different trading strategies.

Target Characteristics

targets, meta = generate_target_combinations(df, mode='auto')

# Analyze each target
for i in range(1, 9):
    target_col = f'Target_{i}'
    positive_pct = targets[target_col].mean() * 100

    print(f"{target_col}:")
    print(f"  Method: {meta[f'method_{i}']}")
    print(f"  Positive: {positive_pct:.1f}%")
    print(f"  Negative: {100-positive_pct:.1f}%")
    print()

Compare Target Types

# Target 1: Conservative (Close[N]/Close[0])
# Requires price to be higher at specific future date

# Target 7: Moderate (MaxHigh/Close[0])
# Requires max high within period to exceed threshold
# More achievable than Target 1

# Target 8: Aggressive (MaxHigh/High[0])
# Most stringent - requires exceeding current high

# Count positives for each
for i in [1, 7, 8]:
    count = targets[f'Target_{i}'].sum()
    pct = targets[f'Target_{i}'].mean() * 100
    print(f"Target {i}: {count} positives ({pct:.1f}%)")

Validating Targets

Always validate target quality before using in production.

Class Balance Check

targets, meta = generate_target_combinations(df, mode='auto')

# Check class balance for all targets
print("Class Distribution:")
print("-" * 40)
for col in targets.columns:
    pos = targets[col].sum()
    neg = len(targets) - pos
    ratio = pos / len(targets) * 100
    print(f"{col}: {pos} positive ({ratio:.1f}%), {neg} negative")

Temporal Validation

Check if targets are distributed across time:

import matplotlib.pyplot as plt

targets, meta = generate_target_combinations(df, mode='auto')

# Check Target_7 distribution over time
# Assuming df has Date column
df['Target_7'] = targets['Target_7']

# Group by month
df['Month'] = pd.to_datetime(df['Date']).dt.to_period('M')
monthly = df.groupby('Month')['Target_7'].mean()

print("Monthly positive rate:")
print(monthly)

# Should not have months with 0% or 100% positive rate

Forward-Looking Validation

Never validate on data used to generate targets:

# WRONG: Using same data for generation and validation
targets, meta = generate_target_combinations(df, mode='auto')
X_train, X_test, y_train, y_test = train_test_split(X, targets['Target_7'])
# This will overfit!

# CORRECT: Time-based split
split_date = '2024-01-01'

# Generate targets on training data only
train_df = df[df['Date'] < split_date]
targets_train, meta = generate_target_combinations(train_df, mode='auto')

# Apply same parameters to test data
test_df = df[df['Date'] >= split_date]
# Use meta['method_7']['period'] and meta['method_7']['threshold']
# to generate test targets with same parameters

Practical Examples

Example 1: Multi-Timeframe Targets

# Generate targets for different timeframes
df = pd.read_csv('prices.csv')

# Short-term (3 days)
targets_3d, meta_3d = generate_target_combinations(
    df, mode='manual', lookback_periods=3
)
df['Target_3d'] = targets_3d['Target_7']

# Medium-term (10 days)
targets_10d, meta_10d = generate_target_combinations(
    df, mode='manual', lookback_periods=10
)
df['Target_10d'] = targets_10d['Target_7']

# Use both targets for hierarchical prediction
# First predict 3d, then 10d

Example 2: Target Ensemble

# Use multiple targets for ensemble approach
targets, meta = generate_target_combinations(df, mode='auto')

# Create ensemble target: positive if ANY of multiple targets true
ensemble = (
    targets['Target_5'] |
    targets['Target_6'] |
    targets['Target_7']
).astype(int)

print(f"Ensemble positive rate: {ensemble.mean():.1%}")

Example 3: Conservative Target

# Create very conservative target for high-precision trading
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.1,  # Only 10% positive
    threshold_range=(10.0, 30.0)  # High thresholds
)

# This should give fewer but higher-quality signals
print(f"Positive signals: {targets['Target_7'].sum()}")
print(f"Average threshold: {meta['method_7']['threshold']}%")

Saving and Loading Metadata

Always save target generation parameters for reproducibility.

Save Metadata

import json

targets, meta = generate_target_combinations(df, mode='auto')

# Save metadata
with open('target_metadata.json', 'w') as f:
    json.dump(meta, f, indent=2)

# Save targets
targets.to_csv('targets.csv', index=False)

Load and Apply

import json

# Load metadata
with open('target_metadata.json', 'r') as f:
    meta = json.load(f)

# Apply to new data using same parameters
# (You'll need to implement the logic using meta parameters)
period = meta['method_7']['period']
threshold = meta['method_7']['threshold']

print(f"Using period={period}, threshold={threshold}%")

Best Practices

  1. Always validate targets on out-of-sample data

  2. Save metadata for reproducibility

  3. Check class balance - extreme imbalance causes issues

  4. Use appropriate timeframes - match your trading style

  5. Start with auto mode - it finds good defaults

  6. Test multiple targets - different targets for different strategies

  7. Consider transaction costs - adjust thresholds accordingly

  8. Avoid data leakage - never peek at future data during training

Next Steps

Continue to Complete ML Pipeline to see how to build end-to-end ML pipelines using generated targets.