Visualization Guide

Learn how to create professional visualizations for evaluating machine learning predictions using Rhoa’s plots accessor.

Overview

Rhoa provides a powerful visualization system through the .plots accessor on pandas DataFrames. Currently focused on the signal() method, which creates comprehensive visualizations showing:

  • Stock price charts with predicted buy signals

  • Confusion matrices with detailed metrics

  • False positive and false negative identification

  • Professional styling suitable for presentations and reports

Quick Start

Basic Signal Plot

import pandas as pd
import rhoa

# Load your data with predictions
df = pd.read_csv('stock_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Assume you have predictions and ground truth
predictions = model.predict(X_test)
ground_truth = y_test

# Create visualization
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=ground_truth,
    date_col='Date',
    price_col='Close'
)

This creates a two-panel visualization:

  • Top panel: Confusion matrix with precision/recall metrics

  • Bottom panel: Price chart with signals overlaid

The signal() Method

Complete API

df.rhoa.plots.signal(
    y_pred,                    # Required: predictions
    y_true=None,              # Optional: ground truth
    date_col='Date',          # Date column name
    price_col='Close',        # Price column to plot
    threshold=None,           # Prediction threshold used
    title=None,               # Custom title
    figsize=(18, 10),         # Figure size
    cmap='Blues',             # Confusion matrix colormap
    save_path=None,           # Path to save figure
    dpi=300,                  # Resolution for saved figure
    show=True                 # Whether to display
)

Parameters Explained

y_pred (required)

Binary predictions array (0 or 1) from your model.

# From scikit-learn model
y_pred = model.predict(X_test)

# From probability predictions
y_pred_proba = model.predict_proba(X_test)[:, 1]
y_pred = (y_pred_proba > 0.6).astype(int)  # Custom threshold
y_true (optional)

Ground truth labels. When provided:

  • Adds confusion matrix panel

  • Shows precision and recall metrics

  • Highlights false positives (red X)

  • Highlights false negatives (orange circles)

# With ground truth (full visualization)
fig = df.rhoa.plots.signal(y_pred=predictions, y_true=targets)

# Without ground truth (predictions only)
fig = df.rhoa.plots.signal(y_pred=predictions)
date_col (default=’Date’)

Name of the date/timestamp column.

# If your date column is named differently
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    date_col='Timestamp'  # Custom column name
)
price_col (default=’Close’)

Which price to plot (usually ‘Close’, but can be ‘Open’, ‘High’, ‘Low’).

# Plot with High prices instead of Close
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    price_col='High'
)
threshold (optional)

The prediction threshold that was used. Displayed in title for reference.

threshold = 0.67
y_pred = (model.predict_proba(X)[:, 1] > threshold).astype(int)

fig = df.rhoa.plots.signal(
    y_pred=y_pred,
    y_true=y_true,
    threshold=threshold  # Shows "Threshold: 0.67" in title
)
title (optional)

Custom title for the plot.

fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    title='AAPL Random Forest Model'
)
figsize (default=(18, 10))

Figure size as (width, height) in inches.

# Larger figure for presentations
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    figsize=(24, 14)
)

# Smaller figure for reports
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    figsize=(12, 8)
)
cmap (default=’Blues’)

Colormap for confusion matrix. Options: ‘Blues’, ‘Greens’, ‘Reds’, ‘Purples’, etc.

# Green theme for positive emphasis
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    cmap='Greens'
)
save_path (optional)

Path to save the figure. If None, figure is not saved.

fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    save_path='results/aapl_predictions.png'
)
dpi (default=300)

Resolution for saved figure (dots per inch). 300 is publication quality.

# High resolution for printing
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    save_path='report.png',
    dpi=600  # Very high quality
)
show (default=True)

Whether to display the plot. Set to False when saving only.

# Save without displaying
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    save_path='output.png',
    show=False
)

Understanding the Visualization

Confusion Matrix Panel

The confusion matrix shows how well your model performed:

┌─────────────────────────────────────┐
│        Confusion Matrix             │
│  Threshold: 0.67 | Precision: 85.3% │
│                                      │
│              Predicted               │
│           No Buy(0)  Buy(1)          │
│  True  ┌─────────────────────┐      │
│  No(0) │  TN: 420  │ FP: 23  │      │
│        │  (95%)    │  (5%)   │      │
│        ├───────────┼─────────┤      │
│  Buy(1)│  FN: 18   │ TP: 105 │      │
│        │  (15%)    │ (85%)   │      │
│        └───────────┴─────────┘      │
│                                      │
│  Metrics Summary:                    │
│  TP: 105  FP: 23                    │
│  TN: 420  FN: 18                    │
│  Total Signals: 128                  │
│  Correct: 105 (85.3%)               │
└─────────────────────────────────────┘

Reading the Matrix:

  • True Negatives (TN): Correctly predicted no-buy (420 cases, 95%)

  • False Positives (FP): Predicted buy but shouldn’t have (23 cases, 5%)

  • False Negatives (FN): Missed opportunities (18 cases, 15%)

  • True Positives (TP): Correctly predicted buy signals (105 cases, 85%)

Key Metrics:

  • Precision = TP / (TP + FP) = 105 / 128 = 85.3%

    “Of all buy signals, 85.3% were correct”

    High precision means fewer false alarms, lower transaction costs.

  • Recall = TP / (TP + FN) = 105 / 123 = 85.4%

    “Of all true opportunities, we caught 85.4%”

    High recall means fewer missed opportunities, more profit potential.

Price Chart Panel

The price chart shows predictions overlaid on price:

┌────────────────────────────────────────┐
│     Close Price with Buy Signals       │
│                                        │
│  $150 ─────────────────────────────   │
│                   ●                    │
│       ○                 ○              │ ○ = True opportunities (light green)
│  $140 ────●────────────────────────   │ ● = Model predictions (bright green)
│         ✗                              │ ✗ = False positives (red)
│  $130 ─────────────────────────────   │ ◯ = Missed opportunities (orange)
│       ○                                │
│  $120 ─────────────────────────────   │
│                                        │
│    Jan    Feb    Mar    Apr    May    │
└────────────────────────────────────────┘

Visual Elements:

  1. Blue Line: Stock price over time

  2. Light Green Background Dots: All true buy opportunities (when y_true provided)

  3. Bright Green Dots: Model’s buy signal predictions

  4. Red X Markers: False positives (predicted buy, but wasn’t a real opportunity)

  5. Orange Circles: False negatives (missed opportunities)

What to Look For:

  • Clustered green dots: Model finding real patterns

  • Red X markers: Where model made mistakes (investigate these)

  • Orange circles: Opportunities the model missed (could improve recall)

  • Position on price chart: Are signals coming at good entry points?

Interpreting Results

Good Model Characteristics

High Precision (> 70%)

# Precision: 85.3%
# Of 128 signals, 105 were correct
  • Few false positives (red X marks)

  • Most green dots align with light green background

  • Signals are reliable

  • Lower transaction costs

High Recall (> 60%)

# Recall: 85.4%
# Caught 105 of 123 opportunities
  • Few orange circles (missed opportunities)

  • Capturing most profitable trades

  • Higher profit potential

Signals at Good Entry Points

Look at the price chart:

  • Are signals coming near local lows? (Good!)

  • Are signals coming near local highs? (Bad - buying at top)

  • Do signals cluster before price increases? (Good!)

Poor Model Characteristics

Low Precision (< 50%)

# Precision: 45.2%
# Of 200 signals, only 90 correct
  • Many red X marks (false positives)

  • Model is too aggressive

  • High transaction costs will eat profits

  • Solution: Increase prediction threshold or retrain

Low Recall (< 40%)

# Recall: 38.5%
# Only caught 70 of 182 opportunities
  • Many orange circles (missed opportunities)

  • Model is too conservative

  • Missing too many profitable trades

  • Solution: Decrease prediction threshold or add features

Random-Looking Signals

If signals appear random on the price chart:

  • No clear pattern relative to price movements

  • Equal distribution of correct/incorrect

  • Model hasn’t learned meaningful patterns

  • Solution: Feature engineering, more data, or different algorithm

Example Interpretations

Scenario 1: High Precision, Low Recall

Confusion Matrix:
Precision: 92.3% | Recall: 42.1%

TP: 48   FP: 4
TN: 480  FN: 66

Interpretation:

  • Model is very conservative

  • When it predicts buy, it’s usually right (92.3%)

  • But it misses many opportunities (66 missed)

  • Only trading 52 times instead of possible 114

Appropriate For:

  • High transaction costs

  • Risk-averse strategies

  • Need high win rate

How to Improve:

# Lower prediction threshold
y_pred_conservative = (y_pred_proba > 0.8).astype(int)  # Current
y_pred_balanced = (y_pred_proba > 0.6).astype(int)      # Better balance

Scenario 2: Low Precision, High Recall

Confusion Matrix:
Precision: 58.7% | Recall: 88.2%

TP: 105  FP: 74
TN: 360  FN: 14

Interpretation:

  • Model is very aggressive

  • Catches almost all opportunities (88.2%)

  • But generates many false signals (74)

  • Trading 179 times (many unnecessary)

Appropriate For:

  • Low transaction costs

  • Market-making strategies

  • Exploration phase

How to Improve:

# Raise prediction threshold
y_pred_aggressive = (y_pred_proba > 0.4).astype(int)  # Current
y_pred_balanced = (y_pred_proba > 0.6).astype(int)    # Better balance

Scenario 3: Balanced Performance

Confusion Matrix:
Precision: 76.5% | Recall: 72.3%

TP: 89   FP: 27
TN: 420  FN: 34

Interpretation:

  • Good balance between precision and recall

  • 116 total signals, 89 correct (76.5%)

  • Caught 89 of 123 opportunities (72.3%)

  • F1 score ≈ 74.3% (harmonic mean)

This Is Ideal: Good balance suitable for most trading strategies.

Scenario 4: Poor Performance

Confusion Matrix:
Precision: 52.1% | Recall: 49.5%

TP: 45   FP: 41
TN: 402  FN: 46

Interpretation:

  • Barely better than random (50%)

  • As many false as true positives

  • Missing half the opportunities

  • Model hasn’t learned useful patterns

Actions:

  1. Check for data leakage

  2. Improve feature engineering

  3. Try different algorithms

  4. Get more/better quality data

Practical Examples

Complete Workflow

import pandas as pd
import numpy as np
import rhoa
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# 1. Load data
df = pd.read_csv('AAPL.csv')
df['Date'] = pd.to_datetime(df['Date'])

# 2. Create features
df['SMA_20'] = df['Close'].rolling(20).mean()
df['SMA_50'] = df['Close'].rolling(50).mean()
df['RSI'] = df.rhoa.indicators.rsi(14)
df['Returns'] = df['Close'].pct_change()

# 3. Create target
from rhoa.targets import generate_target_combinations
targets, meta = generate_target_combinations(df, mode='auto')
df['Target'] = targets['Target_7']

# 4. Prepare data
df_clean = df.dropna()
features = ['SMA_20', 'SMA_50', 'RSI', 'Returns']
X = df_clean[features]
y = df_clean['Target']

# 5. Split
split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]
df_test = df_clean[split_idx:]

# 6. Train
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# 7. Predict
y_pred = model.predict(X_test)

# 8. Visualize
fig = df_test.rhoa.plots.signal(
    y_pred=y_pred,
    y_true=y_test,
    date_col='Date',
    price_col='Close',
    title='AAPL Random Forest Predictions',
    save_path='aapl_results.png'
)

Comparing Models

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression

# Train multiple models
models = {
    'Random Forest': RandomForestClassifier(n_estimators=100),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=100),
    'Logistic Regression': LogisticRegression()
}

# Visualize each
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    fig = df_test.rhoa.plots.signal(
        y_pred=y_pred,
        y_true=y_test,
        title=f'{name} - AAPL Predictions',
        save_path=f'results/{name.lower().replace(" ", "_")}.png',
        show=False  # Don't show, just save
    )

# Now compare the saved images side by side

Threshold Optimization

# Get probabilities
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Try different thresholds
thresholds = [0.4, 0.5, 0.6, 0.7, 0.8]

for threshold in thresholds:
    y_pred = (y_pred_proba > threshold).astype(int)

    fig = df_test.rhoa.plots.signal(
        y_pred=y_pred,
        y_true=y_test,
        threshold=threshold,
        title=f'AAPL - Threshold {threshold}',
        save_path=f'threshold_analysis/threshold_{threshold}.png',
        show=False
    )

# Review images to find optimal threshold

Predictions Only (No Ground Truth)

# When you don't have ground truth (production/future predictions)
future_data = pd.read_csv('new_data.csv')
future_data['Date'] = pd.to_datetime(future_data['Date'])

# Create same features
future_data['SMA_20'] = future_data['Close'].rolling(20).mean()
future_data['SMA_50'] = future_data['Close'].rolling(50).mean()
future_data['RSI'] = future_data['Close'].rhoa.indicators.rsi(14)
future_data['Returns'] = future_data['Close'].pct_change()

future_clean = future_data.dropna()
X_future = future_clean[features]

# Predict
y_pred_future = model.predict(X_future)

# Visualize (no ground truth, no confusion matrix)
fig = future_clean.rhoa.plots.signal(
    y_pred=y_pred_future,
    # No y_true parameter
    date_col='Date',
    price_col='Close',
    title='AAPL Future Predictions',
    save_path='future_signals.png'
)

Customization Options

Color Schemes

# Professional blue theme (default)
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Blues')

# Success/green theme
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Greens')

# Warning/red theme
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Reds')

# Purple theme
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, cmap='Purples')

Figure Sizes for Different Uses

# For presentations (large, high DPI)
fig = df.rhoa.plots.signal(
    y_pred=pred, y_true=true,
    figsize=(24, 14),
    dpi=150,
    save_path='presentation.png'
)

# For papers/publications (standard, very high DPI)
fig = df.rhoa.plots.signal(
    y_pred=pred, y_true=true,
    figsize=(12, 8),
    dpi=600,
    save_path='publication.png'
)

# For reports (medium)
fig = df.rhoa.plots.signal(
    y_pred=pred, y_true=true,
    figsize=(15, 9),
    dpi=300,
    save_path='report.png'
)

# For web/dashboard (smaller, lower DPI)
fig = df.rhoa.plots.signal(
    y_pred=pred, y_true=true,
    figsize=(10, 6),
    dpi=100,
    save_path='dashboard.png'
)

Further Customization

The method returns a matplotlib Figure object, allowing further customization:

import matplotlib.pyplot as plt

# Get figure
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, show=False)

# Access axes
axes = fig.get_axes()
confusion_ax = axes[0]  # First panel
price_ax = axes[1]      # Second panel

# Customize
price_ax.set_ylabel('Price (USD)', fontsize=14)
price_ax.grid(True, linestyle='--', alpha=0.5)

# Add annotations
price_ax.annotate(
    'Important Event',
    xy=('2024-01-15', 150),
    xytext=('2024-02-01', 160),
    arrowprops=dict(arrowstyle='->', color='red')
)

# Save customized version
plt.savefig('customized.png', dpi=300, bbox_inches='tight')
plt.show()

Best Practices

Do’s

  1. Always Validate on Out-of-Sample Data

    # Use time-based split
    split_idx = int(len(df) * 0.8)
    train, test = df[:split_idx], df[split_idx:]
    
  2. Save Visualizations for Documentation

    fig = df.rhoa.plots.signal(
        y_pred=pred, y_true=true,
        save_path=f'results/{model_name}_{datetime.date.today()}.png'
    )
    
  3. Include Threshold in Filename When Comparing

    for t in [0.5, 0.6, 0.7]:
        y_pred = (proba > t).astype(int)
        df.rhoa.plots.signal(
            y_pred=y_pred, y_true=y_true,
            save_path=f'threshold_{t:.1f}.png',
            show=False
        )
    
  4. Use Descriptive Titles

    title = f'{ticker} - {model_type} - {date_range} - Threshold {threshold}'
    fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, title=title)
    

Don’ts

  1. Don’t Visualize Training Data Performance

    # WRONG - visualizing training data
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    df_train.rhoa.plots.signal(y_pred=y_train_pred, y_true=y_train)
    # This will look artificially good!
    
    # CORRECT - visualize test data
    y_test_pred = model.predict(X_test)
    df_test.rhoa.plots.signal(y_pred=y_test_pred, y_true=y_test)
    
  2. Don’t Compare Models on Different Date Ranges

    # WRONG
    model1_pred = model1.predict(df['2024-01':'2024-06'])
    model2_pred = model2.predict(df['2024-03':'2024-08'])
    
    # CORRECT - same test period
    test_period = df['2024-06':'2024-12']
    model1_pred = model1.predict(test_period)
    model2_pred = model2.predict(test_period)
    
  3. Don’t Ignore the Price Chart

    Don’t just look at precision/recall numbers. Always check: - Are signals at reasonable entry points? - Do false positives have a pattern? - Are missed opportunities (FN) avoidable?

Troubleshooting

Common Issues

Issue: “Length of y_pred doesn’t match DataFrame length”

# Problem: NaN values were dropped
y_pred = model.predict(X_test)  # Length: 100
df_test                         # Length: 120 (includes NaN rows)

# Solution: Drop NaN before splitting
df_clean = df.dropna()
# Then split and use df_clean for plotting

Issue: Confusion matrix shows all zeros

# Problem: y_true or y_pred are all same class
print(y_pred.sum())  # Should not be 0 or len(y_pred)

# Solution: Check model is actually predicting both classes
print(np.unique(y_pred, return_counts=True))

Issue: Figure doesn’t display

# If using Jupyter
%matplotlib inline

# If using script
import matplotlib.pyplot as plt
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true)
plt.show()  # Explicitly show

Issue: Date axis is crowded/unreadable

# Solution: Rotate labels automatically applied
# But you can customize further
fig = df.rhoa.plots.signal(y_pred=pred, y_true=true, show=False)
ax = fig.get_axes()[1]  # Price chart
ax.tick_params(axis='x', rotation=45, labelsize=10)
plt.tight_layout()
plt.show()

Performance Tips

For Large Datasets

# Subsample for visualization if very large
if len(df_test) > 1000:
    sample_idx = np.linspace(0, len(df_test)-1, 1000, dtype=int)
    df_sample = df_test.iloc[sample_idx]
    y_pred_sample = y_pred[sample_idx]
    y_true_sample = y_true[sample_idx]

    fig = df_sample.rhoa.plots.signal(
        y_pred=y_pred_sample,
        y_true=y_true_sample
    )

Batch Processing

# Process multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']

for ticker in tickers:
    df = load_data(ticker)
    # ... train model, get predictions ...

    fig = df_test.rhoa.plots.signal(
        y_pred=y_pred,
        y_true=y_true,
        title=f'{ticker} Predictions',
        save_path=f'results/{ticker}.png',
        show=False  # Don't show, just save
    )
    plt.close(fig)  # Free memory

Summary

The plots.signal() method provides:

  • Comprehensive visualization of model performance

  • Confusion matrix with precision/recall metrics

  • Price chart with predictions overlaid

  • False positive/negative identification

  • Professional styling for reports and presentations

  • Flexible customization options

Key points:

  1. Use y_true for full evaluation with confusion matrix

  2. Omit y_true for future predictions visualization

  3. Check both metrics AND price chart patterns

  4. Save visualizations for documentation

  5. Compare multiple models/thresholds visually

  6. Customize for your specific presentation needs

Further Reading