Target Generation
=================

Learn how to generate optimized binary classification targets for machine learning models.

Overview
--------

Rhoa's target generation creates 8 different binary target types based on future price movements:

1. **Target_1**: Close[N]/Close[0] - Future close vs current close
2. **Target_2**: Close[N]/High[0] - Future close vs current high
3. **Target_3**: High[N]/Close[0] - Future high vs current close
4. **Target_4**: High[N]/High[0] - Future high vs current high
5. **Target_5**: MaxClose/Close[0] - Max future close vs current close
6. **Target_6**: MaxClose/High[0] - Max future close vs current high
7. **Target_7**: MaxHigh/Close[0] - Max future high vs current close
8. **Target_8**: MaxHigh/High[0] - Max future high vs current high

Two modes are available:
- **Auto mode**: Pareto optimization finds optimal lookback period AND threshold
- **Manual mode**: Fixed lookback period with elbow method for thresholds

.. _target-auto:

Auto Mode (Pareto Optimization)
--------------------------------

Auto mode automatically finds the best combination of lookback period and threshold.

Basic Auto Mode
~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   from rhoa.targets import generate_target_combinations

   # Load OHLC data
   df = pd.read_csv('prices.csv')

   # Generate targets with auto mode
   targets, metadata = generate_target_combinations(
       df,
       mode='auto',
       target_class_balance=0.5  # Aim for 50% positive instances
   )

   # Inspect the targets
   print(targets.head())
   print(f"\nShape: {targets.shape}")
   print(f"\nColumns: {targets.columns.tolist()}")

Understanding Metadata
~~~~~~~~~~~~~~~~~~~~~~

The metadata dictionary contains parameters for each target:

.. code-block:: python

   # Check what parameters were found optimal
   for method, params in metadata.items():
       print(f"{method}:")
       print(f"  Period: {params['period']} days")
       print(f"  Threshold: {params['threshold']}%")
       print(f"  Positive instances: {params['instances']}")
       print(f"  % of maximum: {params['pct_of_max']:.1f}%")
       print()

Example output:

.. code-block:: text

   method_1:
     Period: 5 days
     Threshold: 3.5%
     Positive instances: 142
     % of maximum: 12.3%

   method_7:
     Period: 6 days
     Threshold: 4.0%
     Positive instances: 249
     % of maximum: 21.5%

Custom Class Balance
~~~~~~~~~~~~~~~~~~~~

Adjust the target class balance based on your needs:

.. code-block:: python

   # Conservative: 30% positive instances (higher precision)
   targets_conservative, meta = generate_target_combinations(
       df,
       mode='auto',
       target_class_balance=0.3
   )

   # Aggressive: 70% positive instances (higher recall)
   targets_aggressive, meta = generate_target_combinations(
       df,
       mode='auto',
       target_class_balance=0.7
   )

   # Balanced: 50% positive instances
   targets_balanced, meta = generate_target_combinations(
       df,
       mode='auto',
       target_class_balance=0.5
   )

Custom Search Ranges
~~~~~~~~~~~~~~~~~~~~

Control the optimization search space:

.. code-block:: python

   # Search only short-term periods (1-5 days)
   targets, meta = generate_target_combinations(
       df,
       mode='auto',
       period_range=(1, 5),
       threshold_range=(1.0, 5.0),
       target_class_balance=0.5
   )

   # Search longer-term periods (10-30 days)
   targets, meta = generate_target_combinations(
       df,
       mode='auto',
       period_range=(10, 30),
       threshold_range=(5.0, 20.0),
       target_class_balance=0.4
   )

.. _target-manual:

Manual Mode (Elbow Method)
---------------------------

Manual mode uses a fixed lookback period and finds optimal thresholds using the elbow method.

Basic Manual Mode
~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Generate targets with fixed 5-day lookback
   targets, metadata = generate_target_combinations(
       df,
       mode='manual',
       lookback_periods=5
   )

   # Check the detected thresholds
   for method, params in metadata.items():
       print(f"{method}: threshold={params['threshold']}%, "
             f"instances={params['instances']}")

Different Lookback Periods
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Short-term: 3 days
   targets_short, meta_short = generate_target_combinations(
       df,
       mode='manual',
       lookback_periods=3
   )

   # Medium-term: 10 days
   targets_medium, meta_medium = generate_target_combinations(
       df,
       mode='manual',
       lookback_periods=10
   )

   # Long-term: 20 days
   targets_long, meta_long = generate_target_combinations(
       df,
       mode='manual',
       lookback_periods=20
   )

   # Compare signal counts
   print(f"Short-term (3d): {targets_short['Target_7'].sum()} signals")
   print(f"Medium-term (10d): {targets_medium['Target_7'].sum()} signals")
   print(f"Long-term (20d): {targets_long['Target_7'].sum()} signals")

Choosing the Right Target
--------------------------

Different targets serve different trading strategies.

Target Characteristics
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   targets, meta = generate_target_combinations(df, mode='auto')

   # Analyze each target
   for i in range(1, 9):
       target_col = f'Target_{i}'
       positive_pct = targets[target_col].mean() * 100

       print(f"{target_col}:")
       print(f"  Method: {meta[f'method_{i}']}")
       print(f"  Positive: {positive_pct:.1f}%")
       print(f"  Negative: {100-positive_pct:.1f}%")
       print()

Compare Target Types
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Target 1: Conservative (Close[N]/Close[0])
   # Requires price to be higher at specific future date

   # Target 7: Moderate (MaxHigh/Close[0])
   # Requires max high within period to exceed threshold
   # More achievable than Target 1

   # Target 8: Aggressive (MaxHigh/High[0])
   # Most stringent - requires exceeding current high

   # Count positives for each
   for i in [1, 7, 8]:
       count = targets[f'Target_{i}'].sum()
       pct = targets[f'Target_{i}'].mean() * 100
       print(f"Target {i}: {count} positives ({pct:.1f}%)")

Recommended Targets by Strategy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   """
   Swing Trading (hold 3-10 days):
     - Target_7 (MaxHigh/Close[0])
     - Captures upward swings
     - Moderate signal frequency

   Day Trading (hold intraday):
     - Target_1 (Close[N]/Close[0])
     - Quick entries and exits
     - Higher frequency

   Position Trading (hold weeks/months):
     - Target_5 (MaxClose/Close[0])
     - Sustained movements
     - Lower frequency, higher conviction

   Mean Reversion:
     - Target_1 with lower thresholds (1-3%)
     - Quick profit taking

   Momentum/Breakout:
     - Target_7/8 with higher thresholds (>5%)
     - Captures strong moves
   """

Validating Targets
------------------

Always validate target quality before using in production.

Class Balance Check
~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   targets, meta = generate_target_combinations(df, mode='auto')

   # Check class balance for all targets
   print("Class Distribution:")
   print("-" * 40)
   for col in targets.columns:
       pos = targets[col].sum()
       neg = len(targets) - pos
       ratio = pos / len(targets) * 100
       print(f"{col}: {pos} positive ({ratio:.1f}%), {neg} negative")

Temporal Validation
~~~~~~~~~~~~~~~~~~~

Check if targets are distributed across time:

.. code-block:: python

   import matplotlib.pyplot as plt

   targets, meta = generate_target_combinations(df, mode='auto')

   # Check Target_7 distribution over time
   # Assuming df has Date column
   df['Target_7'] = targets['Target_7']

   # Group by month
   df['Month'] = pd.to_datetime(df['Date']).dt.to_period('M')
   monthly = df.groupby('Month')['Target_7'].mean()

   print("Monthly positive rate:")
   print(monthly)

   # Should not have months with 0% or 100% positive rate

Forward-Looking Validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Never validate on data used to generate targets:

.. code-block:: python

   # WRONG: Using same data for generation and validation
   targets, meta = generate_target_combinations(df, mode='auto')
   X_train, X_test, y_train, y_test = train_test_split(X, targets['Target_7'])
   # This will overfit!

   # CORRECT: Time-based split
   split_date = '2024-01-01'

   # Generate targets on training data only
   train_df = df[df['Date'] < split_date]
   targets_train, meta = generate_target_combinations(train_df, mode='auto')

   # Apply same parameters to test data
   test_df = df[df['Date'] >= split_date]
   # Use meta['method_7']['period'] and meta['method_7']['threshold']
   # to generate test targets with same parameters

Practical Examples
------------------

Example 1: Multi-Timeframe Targets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Generate targets for different timeframes
   df = pd.read_csv('prices.csv')

   # Short-term (3 days)
   targets_3d, meta_3d = generate_target_combinations(
       df, mode='manual', lookback_periods=3
   )
   df['Target_3d'] = targets_3d['Target_7']

   # Medium-term (10 days)
   targets_10d, meta_10d = generate_target_combinations(
       df, mode='manual', lookback_periods=10
   )
   df['Target_10d'] = targets_10d['Target_7']

   # Use both targets for hierarchical prediction
   # First predict 3d, then 10d

Example 2: Target Ensemble
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Use multiple targets for ensemble approach
   targets, meta = generate_target_combinations(df, mode='auto')

   # Create ensemble target: positive if ANY of multiple targets true
   ensemble = (
       targets['Target_5'] |
       targets['Target_6'] |
       targets['Target_7']
   ).astype(int)

   print(f"Ensemble positive rate: {ensemble.mean():.1%}")

Example 3: Conservative Target
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Create very conservative target for high-precision trading
   targets, meta = generate_target_combinations(
       df,
       mode='auto',
       target_class_balance=0.1,  # Only 10% positive
       threshold_range=(10.0, 30.0)  # High thresholds
   )

   # This should give fewer but higher-quality signals
   print(f"Positive signals: {targets['Target_7'].sum()}")
   print(f"Average threshold: {meta['method_7']['threshold']}%")

Saving and Loading Metadata
----------------------------

Always save target generation parameters for reproducibility.

Save Metadata
~~~~~~~~~~~~~

.. code-block:: python

   import json

   targets, meta = generate_target_combinations(df, mode='auto')

   # Save metadata
   with open('target_metadata.json', 'w') as f:
       json.dump(meta, f, indent=2)

   # Save targets
   targets.to_csv('targets.csv', index=False)

Load and Apply
~~~~~~~~~~~~~~

.. code-block:: python

   import json

   # Load metadata
   with open('target_metadata.json', 'r') as f:
       meta = json.load(f)

   # Apply to new data using same parameters
   # (You'll need to implement the logic using meta parameters)
   period = meta['method_7']['period']
   threshold = meta['method_7']['threshold']

   print(f"Using period={period}, threshold={threshold}%")

Best Practices
--------------

1. **Always validate targets** on out-of-sample data
2. **Save metadata** for reproducibility
3. **Check class balance** - extreme imbalance causes issues
4. **Use appropriate timeframes** - match your trading style
5. **Start with auto mode** - it finds good defaults
6. **Test multiple targets** - different targets for different strategies
7. **Consider transaction costs** - adjust thresholds accordingly
8. **Avoid data leakage** - never peek at future data during training

Next Steps
----------

Continue to :doc:`complete_pipeline` to see how to build end-to-end ML pipelines
using generated targets.