Visualization Guide
===================

Learn how to create professional visualizations for evaluating machine learning predictions using Rhoa's plots accessor.

Overview
--------

Rhoa provides a powerful visualization system through the ``.plots`` accessor on pandas DataFrames. Currently focused on the ``signal()`` method, which creates comprehensive visualizations showing:

- Stock price charts with predicted buy signals
- Confusion matrices with detailed metrics
- False positive and false negative identification
- Professional styling suitable for presentations and reports

Quick Start
-----------

Basic Signal Plot
~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   import rhoa

   # Load your data with predictions
   df = pd.read_csv('stock_data.csv')
   df['Date'] = pd.to_datetime(df['Date'])

   # Assume you have predictions and ground truth
   predictions = model.predict(X_test)
   ground_truth = y_test

   # Create visualization
   fig = df.rhoa.plots.signal(
       y_pred=predictions,
       y_true=ground_truth,
       date_col='Date',
       price_col='Close'
   )

This creates a two-panel visualization:

- **Top panel**: Confusion matrix with precision/recall metrics
- **Bottom panel**: Price chart with signals overlaid

The signal() Method
-------------------

Complete API
~~~~~~~~~~~~

.. code-block:: python

   df.rhoa.plots.signal(
       y_pred,                    # Required: predictions
       y_true=None,              # Optional: ground truth
       date_col='Date',          # Date column name
       price_col='Close',        # Price column to plot
       threshold=None,           # Prediction threshold used
       title=None,               # Custom title
       figsize=(18, 10),         # Figure size
       cmap='Blues',             # Confusion matrix colormap
       save_path=None,           # Path to save figure
       dpi=300,                  # Resolution for saved figure
       show=True                 # Whether to display
   )

Parameters Explained
~~~~~~~~~~~~~~~~~~~~

**y_pred** (required)
   Binary predictions array (0 or 1) from your model.

   .. code-block:: python

      # From scikit-learn model
      y_pred = model.predict(X_test)

      # From probability predictions
      y_pred_proba = model.predict_proba(X_test)[:, 1]
      y_pred = (y_pred_proba > 0.6).astype(int)  # Custom threshold

**y_true** (optional)
   Ground truth labels. When provided:

   - Adds confusion matrix panel
   - Shows precision and recall metrics
   - Highlights false positives (red X)
   - Highlights false negatives (orange circles)

   .. code-block:: python

      # With ground truth (full visualization)
      fig = df.rhoa.plots.signal(y_pred=predictions, y_true=targets)

      # Without ground truth (predictions only)
      fig = df.rhoa.plots.signal(y_pred=predictions)

**date_col** (default='Date')
   Name of the date/timestamp column.

   .. code-block:: python

      # If your date column is named differently
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          date_col='Timestamp'  # Custom column name
      )

**price_col** (default='Close')
   Which price to plot (usually 'Close', but can be 'Open', 'High', 'Low').

   .. code-block:: python

      # Plot with High prices instead of Close
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          price_col='High'
      )

**threshold** (optional)
   The prediction threshold that was used. Displayed in title for reference.

   .. code-block:: python

      threshold = 0.67
      y_pred = (model.predict_proba(X)[:, 1] > threshold).astype(int)

      fig = df.rhoa.plots.signal(
          y_pred=y_pred,
          y_true=y_true,
          threshold=threshold  # Shows "Threshold: 0.67" in title
      )

**title** (optional)
   Custom title for the plot.

   .. code-block:: python

      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          title='AAPL Random Forest Model'
      )

**figsize** (default=(18, 10))
   Figure size as (width, height) in inches.

   .. code-block:: python

      # Larger figure for presentations
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          figsize=(24, 14)
      )

      # Smaller figure for reports
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          figsize=(12, 8)
      )

**cmap** (default='Blues')
   Colormap for confusion matrix. Options: 'Blues', 'Greens', 'Reds', 'Purples', etc.

   .. code-block:: python

      # Green theme for positive emphasis
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          cmap='Greens'
      )

**save_path** (optional)
   Path to save the figure. If None, figure is not saved.

   .. code-block:: python

      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          save_path='results/aapl_predictions.png'
      )

**dpi** (default=300)
   Resolution for saved figure (dots per inch). 300 is publication quality.

   .. code-block:: python

      # High resolution for printing
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          save_path='report.png',
          dpi=600  # Very high quality
      )

**show** (default=True)
   Whether to display the plot. Set to False when saving only.

   .. code-block:: python

      # Save without displaying
      fig = df.rhoa.plots.signal(
          y_pred=predictions,
          y_true=targets,
          save_path='output.png',
          show=False
      )

Understanding the Visualization
--------------------------------

Confusion Matrix Panel
~~~~~~~~~~~~~~~~~~~~~~

The confusion matrix shows how well your model performed:

.. code-block:: text

   ┌─────────────────────────────────────┐
   │        Confusion Matrix             │
   │  Threshold: 0.67 | Precision: 85.3% │
   │                                      │
   │              Predicted               │
   │           No Buy(0)  Buy(1)          │
   │  True  ┌─────────────────────┐      │
   │  No(0) │  TN: 420  │ FP: 23  │      │
   │        │  (95%)    │  (5%)   │      │
   │        ├───────────┼─────────┤      │
   │  Buy(1)│  FN: 18   │ TP: 105 │      │
   │        │  (15%)    │ (85%)   │      │
   │        └───────────┴─────────┘      │
   │                                      │
   │  Metrics Summary:                    │
   │  TP: 105  FP: 23                    │
   │  TN: 420  FN: 18                    │
   │  Total Signals: 128                  │
   │  Correct: 105 (85.3%)               │
   └─────────────────────────────────────┘

**Reading the Matrix**:

- **True Negatives (TN)**: Correctly predicted no-buy (420 cases, 95%)
- **False Positives (FP)**: Predicted buy but shouldn't have (23 cases, 5%)
- **False Negatives (FN)**: Missed opportunities (18 cases, 15%)
- **True Positives (TP)**: Correctly predicted buy signals (105 cases, 85%)

**Key Metrics**:

- **Precision = TP / (TP + FP) = 105 / 128 = 85.3%**

  *"Of all buy signals, 85.3% were correct"*

  High precision means fewer false alarms, lower transaction costs.

- **Recall = TP / (TP + FN) = 105 / 123 = 85.4%**

  *"Of all true opportunities, we caught 85.4%"*

  High recall means fewer missed opportunities, more profit potential.

Price Chart Panel
~~~~~~~~~~~~~~~~~

The price chart shows predictions overlaid on price:

.. code-block:: text

   ┌────────────────────────────────────────┐
   │     Close Price with Buy Signals       │
   │                                        │
   │  $150 ─────────────────────────────   │
   │                   ●                    │
   │       ○                 ○              │ ○ = True opportunities (light green)
   │  $140 ────●────────────────────────   │ ● = Model predictions (bright green)
   │         ✗                              │ ✗ = False positives (red)
   │  $130 ─────────────────────────────   │ ◯ = Missed opportunities (orange)
   │       ○                                │
   │  $120 ─────────────────────────────   │
   │                                        │
   │    Jan    Feb    Mar    Apr    May    │
   └────────────────────────────────────────┘

**Visual Elements**:

1. **Blue Line**: Stock price over time
2. **Light Green Background Dots**: All true buy opportunities (when y_true provided)
3. **Bright Green Dots**: Model's buy signal predictions
4. **Red X Markers**: False positives (predicted buy, but wasn't a real opportunity)
5. **Orange Circles**: False negatives (missed opportunities)

**What to Look For**:

- **Clustered green dots**: Model finding real patterns
- **Red X markers**: Where model made mistakes (investigate these)
- **Orange circles**: Opportunities the model missed (could improve recall)
- **Position on price chart**: Are signals coming at good entry points?

Interpreting Results
--------------------

Good Model Characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~

**High Precision (> 70%)**

.. code-block:: python

   # Precision: 85.3%
   # Of 128 signals, 105 were correct

- Few false positives (red X marks)
- Most green dots align with light green background
- Signals are reliable
- Lower transaction costs

**High Recall (> 60%)**

.. code-block:: python

   # Recall: 85.4%
   # Caught 105 of 123 opportunities

- Few orange circles (missed opportunities)
- Capturing most profitable trades
- Higher profit potential

**Signals at Good Entry Points**

Look at the price chart:

- Are signals coming near local lows? (Good!)
- Are signals coming near local highs? (Bad - buying at top)
- Do signals cluster before price increases? (Good!)

Poor Model Characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Low Precision (< 50%)**

.. code-block:: python

   # Precision: 45.2%
   # Of 200 signals, only 90 correct

- Many red X marks (false positives)
- Model is too aggressive
- High transaction costs will eat profits
- **Solution**: Increase prediction threshold or retrain

**Low Recall (< 40%)**

.. code-block:: python

   # Recall: 38.5%
   # Only caught 70 of 182 opportunities

- Many orange circles (missed opportunities)
- Model is too conservative
- Missing too many profitable trades
- **Solution**: Decrease prediction threshold or add features

**Random-Looking Signals**

If signals appear random on the price chart:

- No clear pattern relative to price movements
- Equal distribution of correct/incorrect
- Model hasn't learned meaningful patterns
- **Solution**: Feature engineering, more data, or different algorithm

Example Interpretations
-----------------------

Scenario 1: High Precision, Low Recall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Confusion Matrix:
   Precision: 92.3% | Recall: 42.1%

   TP: 48   FP: 4
   TN: 480  FN: 66

**Interpretation**:

- Model is very conservative
- When it predicts buy, it's usually right (92.3%)
- But it misses many opportunities (66 missed)
- Only trading 52 times instead of possible 114

**Appropriate For**:

- High transaction costs
- Risk-averse strategies
- Need high win rate

**How to Improve**:

.. code-block:: python

   # Lower prediction threshold
   y_pred_conservative = (y_pred_proba > 0.8).astype(int)  # Current
   y_pred_balanced = (y_pred_proba > 0.6).astype(int)      # Better balance

Scenario 2: Low Precision, High Recall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Confusion Matrix:
   Precision: 58.7% | Recall: 88.2%

   TP: 105  FP: 74
   TN: 360  FN: 14

**Interpretation**:

- Model is very aggressive
- Catches almost all opportunities (88.2%)
- But generates many false signals (74)
- Trading 179 times (many unnecessary)

**Appropriate For**:

- Low transaction costs
- Market-making strategies
- Exploration phase

**How to Improve**:

.. code-block:: python

   # Raise prediction threshold
   y_pred_aggressive = (y_pred_proba > 0.4).astype(int)  # Current
   y_pred_balanced = (y_pred_proba > 0.6).astype(int)    # Better balance

Scenario 3: Balanced Performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Confusion Matrix:
   Precision: 76.5% | Recall: 72.3%

   TP: 89   FP: 27
   TN: 420  FN: 34

**Interpretation**:

- Good balance between precision and recall
- 116 total signals, 89 correct (76.5%)
- Caught 89 of 123 opportunities (72.3%)
- F1 score ≈ 74.3% (harmonic mean)

**This Is Ideal**: Good balance suitable for most trading strategies.

Scenario 4: Poor Performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Confusion Matrix:
   Precision: 52.1% | Recall: 49.5%

   TP: 45   FP: 41
   TN: 402  FN: 46

**Interpretation**:

- Barely better than random (50%)
- As many false as true positives
- Missing half the opportunities
- Model hasn't learned useful patterns

**Actions**:

1. Check for data leakage
2. Improve feature engineering
3. Try different algorithms
4. Get more/better quality data

Practical Examples
------------------

Complete Workflow
~~~~~~~~~~~~~~~~~

.. code-block:: python

   import pandas as pd
   import numpy as np
   import rhoa
   from sklearn.ensemble import RandomForestClassifier
   from sklearn.model_selection import train_test_split

   # 1. Load data
   df = pd.read_csv('AAPL.csv')
   df['Date'] = pd.to_datetime(df['Date'])

   # 2. Create features
   df['SMA_20'] = df['Close'].rolling(20).mean()
   df['SMA_50'] = df['Close'].rolling(50).mean()
   df['RSI'] = df.rhoa.indicators.rsi(14)
   df['Returns'] = df['Close'].pct_change()

   # 3. Create target
   from rhoa.targets import generate_target_combinations
   targets, meta = generate_target_combinations(df, mode='auto')
   df['Target'] = targets['Target_7']

   # 4. Prepare data
   df_clean = df.dropna()
   features = ['SMA_20', 'SMA_50', 'RSI', 'Returns']
   X = df_clean[features]
   y = df_clean['Target']

   # 5. Split
   split_idx = int(len(X) * 0.8)
   X_train, X_test = X[:split_idx], X[split_idx:]
   y_train, y_test = y[:split_idx], y[split_idx:]
   df_test = df_clean[split_idx:]

   # 6. Train
   model = RandomForestClassifier(n_estimators=100)
   model.fit(X_train, y_train)

   # 7. Predict
   y_pred = model.predict(X_test)

   # 8. Visualize
   fig = df_test.rhoa.plots.signal(
       y_pred=y_pred,
       y_true=y_test,
       date_col='Date',
       price_col='Close',
       title='AAPL Random Forest Predictions',
       save_path='aapl_results.png'
   )

Comparing Models
~~~~~~~~~~~~~~~~

.. code-block:: python

   from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
   from sklearn.linear_model import LogisticRegression

   # Train multiple models
   models = {
       'Random Forest': RandomForestClassifier(n_estimators=100),
       'Gradient Boosting': GradientBoostingClassifier(n_estimators=100),
       'Logistic Regression': LogisticRegression()
   }

   # Visualize each
   for name, model in models.items():
       model.fit(X_train, y_train)
       y_pred = model.predict(X_test)

       fig = df_test.rhoa.plots.signal(
           y_pred=y_pred,
           y_true=y_test,
           title=f'{name} - AAPL Predictions',
           save_path=f'results/{name.lower().replace(" ", "_")}.png',
           show=False  # Don't show, just save
       )

   # Now compare the saved images side by side

Threshold Optimization
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Get probabilities
   y_pred_proba = model.predict_proba(X_test)[:, 1]

   # Try different thresholds
   thresholds = [0.4, 0.5, 0.6, 0.7, 0.8]

   for threshold in thresholds:
       y_pred = (y_pred_proba > threshold).astype(int)

       fig = df_test.rhoa.plots.signal(
           y_pred=y_pred,
           y_true=y_test,
           threshold=threshold,
           title=f'AAPL - Threshold {threshold}',
           save_path=f'threshold_analysis/threshold_{threshold}.png',
           show=False
       )

   # Review images to find optimal threshold

Predictions Only (No Ground Truth)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # When you don't have ground truth (production/future predictions)
   future_data = pd.read_csv('new_data.csv')
   future_data['Date'] = pd.to_datetime(future_data['Date'])

   # Create same features
   future_data['SMA_20'] = future_data['Close'].rolling(20).mean()
   future_data['SMA_50'] = future_data['Close'].rolling(50).mean()
   future_data['RSI'] = future_data['Close'].rhoa.indicators.rsi(14)
   future_data['Returns'] = future_data['Close'].pct_change()

   future_clean = future_data.dropna()
   X_future = future_clean[features]

   # Predict
   y_pred_future = model.predict(X_future)

   # Visualize (no ground truth, no confusion matrix)
   fig = future_clean.rhoa.plots.signal(
       y_pred=y_pred_future,
       # No y_true parameter
       date_col='Date',
       price_col='Close',
       title='AAPL Future Predictions',
       save_path='future_signals.png'
   )

Customization Options
---------------------

Color Schemes
~~~~~~~~~~~~~

.. code-block:: python

   # Professional blue theme (default)
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Blues')

   # Success/green theme
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Greens')

   # Warning/red theme
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Reds')

   # Purple theme
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Purples')

Figure Sizes for Different Uses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # For presentations (large, high DPI)
   fig = df.rhoa.plots.signal(
       y_pred=pred, y_true=true,
       figsize=(24, 14),
       dpi=150,
       save_path='presentation.png'
   )

   # For papers/publications (standard, very high DPI)
   fig = df.rhoa.plots.signal(
       y_pred=pred, y_true=true,
       figsize=(12, 8),
       dpi=600,
       save_path='publication.png'
   )

   # For reports (medium)
   fig = df.rhoa.plots.signal(
       y_pred=pred, y_true=true,
       figsize=(15, 9),
       dpi=300,
       save_path='report.png'
   )

   # For web/dashboard (smaller, lower DPI)
   fig = df.rhoa.plots.signal(
       y_pred=pred, y_true=true,
       figsize=(10, 6),
       dpi=100,
       save_path='dashboard.png'
   )

Further Customization
~~~~~~~~~~~~~~~~~~~~~

The method returns a matplotlib Figure object, allowing further customization:

.. code-block:: python

   import matplotlib.pyplot as plt

   # Get figure
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, show=False)

   # Access axes
   axes = fig.get_axes()
   confusion_ax = axes[0]  # First panel
   price_ax = axes[1]      # Second panel

   # Customize
   price_ax.set_ylabel('Price (USD)', fontsize=14)
   price_ax.grid(True, linestyle='--', alpha=0.5)

   # Add annotations
   price_ax.annotate(
       'Important Event',
       xy=('2024-01-15', 150),
       xytext=('2024-02-01', 160),
       arrowprops=dict(arrowstyle='->', color='red')
   )

   # Save customized version
   plt.savefig('customized.png', dpi=300, bbox_inches='tight')
   plt.show()

Best Practices
--------------

Do's
~~~~

1. **Always Validate on Out-of-Sample Data**

   .. code-block:: python

      # Use time-based split
      split_idx = int(len(df) * 0.8)
      train, test = df[:split_idx], df[split_idx:]

2. **Save Visualizations for Documentation**

   .. code-block:: python

      fig = df.rhoa.plots.signal(
          y_pred=pred, y_true=true,
          save_path=f'results/{model_name}_{datetime.date.today()}.png'
      )

3. **Include Threshold in Filename When Comparing**

   .. code-block:: python

      for t in [0.5, 0.6, 0.7]:
          y_pred = (proba > t).astype(int)
          df.rhoa.plots.signal(
              y_pred=y_pred, y_true=y_true,
              save_path=f'threshold_{t:.1f}.png',
              show=False
          )

4. **Use Descriptive Titles**

   .. code-block:: python

      title = f'{ticker} - {model_type} - {date_range} - Threshold {threshold}'
      fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, title=title)

Don'ts
~~~~~~

1. **Don't Visualize Training Data Performance**

   .. code-block:: python

      # WRONG - visualizing training data
      model.fit(X_train, y_train)
      y_train_pred = model.predict(X_train)
      df_train.rhoa.plots.signal(y_pred=y_train_pred, y_true=y_train)
      # This will look artificially good!

      # CORRECT - visualize test data
      y_test_pred = model.predict(X_test)
      df_test.rhoa.plots.signal(y_pred=y_test_pred, y_true=y_test)

2. **Don't Compare Models on Different Date Ranges**

   .. code-block:: python

      # WRONG
      model1_pred = model1.predict(df['2024-01':'2024-06'])
      model2_pred = model2.predict(df['2024-03':'2024-08'])

      # CORRECT - same test period
      test_period = df['2024-06':'2024-12']
      model1_pred = model1.predict(test_period)
      model2_pred = model2.predict(test_period)

3. **Don't Ignore the Price Chart**

   Don't just look at precision/recall numbers. Always check:
   - Are signals at reasonable entry points?
   - Do false positives have a pattern?
   - Are missed opportunities (FN) avoidable?

Troubleshooting
---------------

Common Issues
~~~~~~~~~~~~~

**Issue**: "Length of y_pred doesn't match DataFrame length"

.. code-block:: python

   # Problem: NaN values were dropped
   y_pred = model.predict(X_test)  # Length: 100
   df_test                         # Length: 120 (includes NaN rows)

   # Solution: Drop NaN before splitting
   df_clean = df.dropna()
   # Then split and use df_clean for plotting

**Issue**: Confusion matrix shows all zeros

.. code-block:: python

   # Problem: y_true or y_pred are all same class
   print(y_pred.sum())  # Should not be 0 or len(y_pred)

   # Solution: Check model is actually predicting both classes
   print(np.unique(y_pred, return_counts=True))

**Issue**: Figure doesn't display

.. code-block:: python

   # If using Jupyter
   %matplotlib inline

   # If using script
   import matplotlib.pyplot as plt
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true)
   plt.show()  # Explicitly show

**Issue**: Date axis is crowded/unreadable

.. code-block:: python

   # Solution: Rotate labels automatically applied
   # But you can customize further
   fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, show=False)
   ax = fig.get_axes()[1]  # Price chart
   ax.tick_params(axis='x', rotation=45, labelsize=10)
   plt.tight_layout()
   plt.show()

Performance Tips
----------------

For Large Datasets
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Subsample for visualization if very large
   if len(df_test) > 1000:
       sample_idx = np.linspace(0, len(df_test)-1, 1000, dtype=int)
       df_sample = df_test.iloc[sample_idx]
       y_pred_sample = y_pred[sample_idx]
       y_true_sample = y_true[sample_idx]

       fig = df_sample.rhoa.plots.signal(
           y_pred=y_pred_sample,
           y_true=y_true_sample
       )

Batch Processing
~~~~~~~~~~~~~~~~

.. code-block:: python

   # Process multiple stocks
   tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']

   for ticker in tickers:
       df = load_data(ticker)
       # ... train model, get predictions ...

       fig = df_test.rhoa.plots.signal(
           y_pred=y_pred,
           y_true=y_true,
           title=f'{ticker} Predictions',
           save_path=f'results/{ticker}.png',
           show=False  # Don't show, just save
       )
       plt.close(fig)  # Free memory

Summary
-------

The ``plots.signal()`` method provides:

- **Comprehensive visualization** of model performance
- **Confusion matrix** with precision/recall metrics
- **Price chart** with predictions overlaid
- **False positive/negative identification**
- **Professional styling** for reports and presentations
- **Flexible customization** options

Key points:

1. Use ``y_true`` for full evaluation with confusion matrix
2. Omit ``y_true`` for future predictions visualization
3. Check both metrics AND price chart patterns
4. Save visualizations for documentation
5. Compare multiple models/thresholds visually
6. Customize for your specific presentation needs

Further Reading
---------------

- :doc:`targets_guide` - Understanding what to predict
- :doc:`indicators_guide` - Features for prediction
- :doc:`/examples/complete_pipeline` - End-to-end example
- :doc:`/api/plots` - Complete API reference