Frequently Asked Questions

Common questions and answers about installing, using, and troubleshooting Rhoa.

Installation & Setup

How do I install Rhoa?

The simplest method is using pip:

pip install rhoa

For all optional features:

pip install rhoa[all]

See the Installation guide for detailed instructions.

Why can’t I import rhoa?

Issue: ModuleNotFoundError: No module named 'rhoa'

Solutions:

Check installation:
```
pip list | grep rhoa
```
If not listed, install it:
```
pip install rhoa
```

Check Python environment:

# Make sure pip and python are from same environment
which python
which pip

Verify in correct environment:

If using virtual environment:

source venv/bin/activate  # Activate first
pip install rhoa

Try upgrading pip:

python -m pip install --upgrade pip
pip install rhoa

Why doesn’t the .indicators accessor work?

Issue: AttributeError: 'Series'/'DataFrame' object has no attribute 'indicators'

Solution: You must import rhoa to register the accessor:

import rhoa  # This line is required!
import pandas as pd

prices = pd.Series([100, 102, 104])
sma = prices.rhoa.indicators.sma(20)  # Now works

Why: Rhoa uses pandas’ accessor API, which requires importing to register.

What Python version do I need?

Requirement: Python 3.9 or higher

Check your version:

python --version

If too old:

# Install newer Python
# On Ubuntu/Debian:
sudo apt-get install python3.9

# On macOS with Homebrew:
brew install python@3.9

# On Windows: Download from python.org

What are the required dependencies?

Core requirements:

pandas >= 1.3
numpy >= 1.21

Optional for target generation:

kneed (for elbow method)
paretoset (for Pareto optimization)

Optional for visualization:

matplotlib
seaborn
scikit-learn (for confusion matrix)

Check versions:

pip show pandas numpy matplotlib

Using Indicators

Why do my indicators return NaN values?

This is normal behavior. Indicators need a minimum number of observations to calculate.

Example:

prices = pd.Series([100, 102, 104, 106, 108])
sma_20 = prices.rhoa.indicators.sma(window_size=20)
# All values will be NaN because we only have 5 data points

Solution: Ensure you have enough data:

# For 20-period indicator, need at least 20 data points
prices = pd.Series(range(100))  # 100 data points
sma_20 = prices.rhoa.indicators.sma(window_size=20)
# First 19 will be NaN, then valid values start

Handling NaN:

# Drop NaN values
df_clean = df.dropna()

# Or forward fill
df['SMA_20'] = df['SMA_20'].fillna(method='ffill')

How do I choose indicator parameters?

General guidelines:

For Window Size:

Short-term trading: 5-20 periods
Medium-term trading: 20-50 periods
Long-term trading: 50-200 periods

Start with defaults:

# These are industry standards
rsi = prices.rhoa.indicators.rsi(window_size=14)        # Standard
macd = prices.rhoa.indicators.macd(12, 26, 9)           # Standard
bb = prices.rhoa.indicators.bollinger_bands(20, 2.0)    # Standard

Then optimize if needed:

# Test different values
for window in [10, 14, 20]:
    rsi = prices.rhoa.indicators.rsi(window_size=window)
    # Evaluate performance
    score = evaluate_strategy(rsi)
    print(f"Window {window}: {score}")

Avoid over-optimization: Don’t tune parameters too much on training data (overfitting).

Why do my indicators give different values than TradingView?

Small differences can occur due to:

Calculation method: Rhoa uses exponential weighting (EWM) for smoothing, while some platforms use different methods.
Data alignment: Check that dates and prices match exactly.
Rounding: Different precision in calculations.

Verify with reference:

# Check MACD calculation step by step
prices = pd.Series([100, 102, 104, ...])

ema_12 = prices.ewm(span=12, adjust=False).mean()
ema_26 = prices.ewm(span=26, adjust=False).mean()
macd_line = ema_12 - ema_26
signal_line = macd_line.ewm(span=9, adjust=False).mean()

# Compare with your reference source

Note: Rhoa uses industry-standard formulas. Small differences are normal and rarely significant for trading decisions.

Can I use indicators with intraday data?

Yes! Indicators work with any timeframe.

# Load 5-minute bars
df_5min = pd.read_csv('intraday_5min.csv')

# Calculate indicators (adjust window sizes for timeframe)
df_5min['SMA_20'] = df_5min.rhoa.indicators.sma(20)  # 100 minutes
df_5min['RSI'] = df_5min.rhoa.indicators.rsi(14)     # 70 minutes

Parameter adjustment:

Daily data: 14-period RSI = 14 days
5-minute data: 14-period RSI = 70 minutes
For similar effect, use more periods: 84 periods = ~7 hours

How do I combine multiple indicators?

# Calculate multiple indicators
# Trend
sma_50 = df.rhoa.indicators.sma(50)
sma_200 = df.rhoa.indicators.sma(200)

# Momentum
rsi = df.rhoa.indicators.rsi(14)
macd_data = df.rhoa.indicators.macd()

# Volatility
atr = df.rhoa.indicators.atr(window_size=14)

# Store in DataFrame
df['SMA_50'] = sma_50
df['SMA_200'] = sma_200
df['RSI'] = rsi
df['MACD'] = macd_data['macd']
df['ATR'] = atr

# Combine conditions
uptrend = df['Close'] > sma_50
strong_trend = sma_50 > sma_200
not_overbought = rsi < 70
bullish_macd = macd_data['macd'] > macd_data['signal']

buy_signal = uptrend & strong_trend & not_overbought & bullish_macd

See Indicators Guide for more details.

Target Generation

What’s the difference between auto and manual mode?

Auto Mode:

Searches across both period AND threshold
Uses Pareto optimization
Finds parameters to match target class balance
Best for: Initial exploration, production systems

targets, meta = generate_target_combinations(
    df, mode='auto', target_class_balance=0.5
)
# Finds: period=6, threshold=4.2%

Manual Mode:

Fixed period (you choose)
Uses elbow method to find threshold
Best for: Specific timeframes, hypothesis testing

targets, meta = generate_target_combinations(
    df, mode='manual', lookback_periods=5
)
# Uses: period=5, threshold=6.1% (elbow-detected)

Recommendation: Start with auto mode, then use manual mode to explore specific periods.

See Targets Guide for detailed comparison.

What is target_class_balance?

Definition: The percentage of positive instances you want in your target.

# 50% positive (balanced)
targets, meta = generate_target_combinations(
    df, mode='auto', target_class_balance=0.5
)
# Result: 512 positive, 512 negative

# 30% positive (conservative)
targets, meta = generate_target_combinations(
    df, mode='auto', target_class_balance=0.3
)
# Result: 307 positive, 717 negative

Tradeoffs:

Higher balance (0.7): More signals, more training data, but lower quality
Lower balance (0.3): Fewer signals, higher quality, but less training data
Balanced (0.5): Good starting point

Consider:

Transaction costs (lower balance = fewer trades)
Training data needs (higher balance = more examples)
Risk tolerance (lower balance = more selective)

Which target method should I use?

There are 8 target methods. Here’s a quick guide:

For Training (want more data):

Method 7 (MaxHigh/Close): Most generous, maximum profit potential
Method 5 (MaxClose/Close): Slightly more conservative

# Training: Use generous target
y_train = targets_train['Target_7']

For Validation (want realism):

Method 1 (Close/Close): Most conservative, actual entry/exit
Method 3 (High/Close): Allows for intraday profit

# Validation: Use conservative target
y_val = targets_val['Target_1']

Why the difference?:

Train on generous targets → model learns from more examples
Validate on conservative targets → realistic performance estimate

Compare all methods:

# See which method works best for your strategy
results = {}
for i in range(1, 9):
    y = targets[f'Target_{i}']
    model.fit(X_train, y)
    results[f'Method_{i}'] = model.score(X_test, y_test)

print(results)

Why do my targets have NaN values?

This is expected. The last N rows will be NaN because we can’t know the future.

targets, meta = generate_target_combinations(df, mode='auto')

# If period=5, last 5 rows will be NaN
print(targets.tail(10))
#         Target_1  Target_2  ...
# 995        True     False  ...
# 996       False     False  ...
# 997         NaN       NaN  ...  ← Last 5 rows
# 998         NaN       NaN  ...
# 999         NaN       NaN  ...

Solution: Drop NaN before training:

# Combine features and targets
ml_data = pd.concat([features, targets], axis=1)

# Drop NaN
clean_data = ml_data.dropna()

# Now split
X = clean_data[feature_cols]
y = clean_data['Target_1']

How do I apply the same parameters to test data?

Important: Generate targets on training data only, then apply parameters to test data.

# 1. Split FIRST
train_df, test_df = train_test_split_by_date(df)

# 2. Generate targets on training data
targets_train, meta = generate_target_combinations(
    train_df, mode='auto'
)

# 3. Extract parameters for Method 7
period = meta['method_7']['period']           # e.g., 6
threshold = meta['method_7']['threshold']     # e.g., 4.2

# 4. Apply same parameters to test data
future_max_high = test_df['High'].shift(-period).rolling(
    window=period, min_periods=1
).max().shift(period)

test_target = (future_max_high / test_df['Close'] - 1 >= threshold / 100)

Why? This avoids look-ahead bias - your model can’t “see” the test data during target optimization.

See Targets Guide for complete workflow.

Visualization

How do I visualize my predictions?

Use the .rhoa.plots.signal() method:

import rhoa

# Get predictions
y_pred = model.predict(X_test)
y_true = test_df['Target']

# Visualize
fig = test_df.rhoa.plots.signal(
    y_pred=y_pred,
    y_true=y_true,
    date_col='Date',
    price_col='Close'
)

This creates:

Confusion matrix with precision/recall
Price chart with signals overlaid
False positive/negative identification

See Visualization Guide for details.

What do the colors mean in the visualization?

On the price chart:

Blue line: Stock price
Light green background dots: All true opportunities (if y_true provided)
Bright green dots: Model’s buy signals
Red X markers: False positives (wrong predictions)
Orange circles: False negatives (missed opportunities)

In the confusion matrix:

Darker blue = more instances in that cell
Numbers show count and percentage

How do I save the visualization?

Use the save_path parameter:

fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    save_path='results/my_model.png',
    dpi=300  # High quality
)

Don’t show, just save:

fig = df.rhoa.plots.signal(
    y_pred=predictions,
    y_true=targets,
    save_path='results/my_model.png',
    show=False  # Don't display
)

What if I don’t have ground truth labels?

Just omit y_true:

# For future predictions (no ground truth yet)
fig = df.rhoa.plots.signal(
    y_pred=predictions,
    # No y_true parameter
    date_col='Date',
    price_col='Close'
)

This shows:

Price chart with predicted signals
No confusion matrix (can’t evaluate without truth)

Machine Learning

Can I use scikit-learn with Rhoa?

Yes! Rhoa is designed to work seamlessly with scikit-learn:

import pandas as pd
import rhoa
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Create features using Rhoa
df['SMA_20'] = df.rhoa.indicators.sma(20)
df['RSI'] = df.rhoa.indicators.rsi(14)

# Create targets using Rhoa
from rhoa.targets import generate_target_combinations
targets, meta = generate_target_combinations(df, mode='auto')

# Use with scikit-learn
X = df[['SMA_20', 'RSI']].dropna()
y = targets['Target_7'].dropna()

model = RandomForestClassifier()
model.fit(X, y)

Why should I use time-based splits?

Financial data is autocorrelated: Today’s price depends on yesterday’s price.

Random splits leak information:

# WRONG - random split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Training data contains future information!

Time-based splits are correct:

# CORRECT - time-based split
split_idx = int(len(df) * 0.8)
train = df[:split_idx]
test = df[split_idx:]

# Or use a specific date
train = df[df['Date'] < '2024-01-01']
test = df[df['Date'] >= '2024-01-01']

Why it matters:

Random splits make your model look better than it is
Time-based splits reflect real trading (you can’t trade the past)
Random splits = look-ahead bias = overfitting

See Basic Concepts for more on time series considerations.

How do I avoid overfitting?

Common causes of overfitting:

Too many features
Too little data
Model too complex
Optimizing on test set

Solutions:

# 1. Use feature selection
from sklearn.feature_selection import SelectKBest, f_classif

selector = SelectKBest(f_classif, k=10)
X_selected = selector.fit_transform(X_train, y_train)

# 2. Use regularization
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(C=0.1)  # Stronger regularization

# 3. Use cross-validation (time-series aware)
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X):
    X_train_cv, X_val_cv = X[train_idx], X[val_idx]
    # Train and validate

# 4. Simplify model
from sklearn.ensemble import RandomForestClassifier

# Instead of
model = RandomForestClassifier(n_estimators=500, max_depth=20)

# Use
model = RandomForestClassifier(n_estimators=100, max_depth=5)

# 5. Get more data
# Collect more historical data if possible

How do I handle class imbalance?

Rhoa’s target generation handles this automatically:

# Control class balance
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    target_class_balance=0.5  # 50% positive
)

Additional techniques:

# 1. Adjust class weights
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(class_weight='balanced')
model.fit(X, y)

# 2. Use different evaluation metrics
from sklearn.metrics import f1_score, precision_score, recall_score

# Don't use accuracy for imbalanced data
# Use precision, recall, or F1 instead

# 3. Use SMOTE (synthetic oversampling)
from imblearn.over_sampling import SMOTE

smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# 4. Adjust prediction threshold
proba = model.predict_proba(X_test)[:, 1]
y_pred = (proba > 0.7).astype(int)  # Higher threshold = fewer predictions

Performance & Optimization

Is Rhoa slow with large datasets?

Rhoa uses pandas and numpy, which are optimized for large datasets. However:

For indicators:

# Indicators are fast - they use vectorized pandas operations
%timeit df.rhoa.indicators.sma(20)
# ~1ms for 100k rows

# Pre-calculate and store if using repeatedly
df['SMA_20'] = df.rhoa.indicators.sma(20)  # Calculate once
# Rather than
# sma = df.rhoa.indicators.sma(20)  # Every time

For target generation:

# Auto mode searches parameter space - can be slow
# Reduce search space for faster results
targets, meta = generate_target_combinations(
    df,
    mode='auto',
    period_step=2,  # Check every 2 periods instead of 1
    step=2          # Check every 2% threshold instead of 1%
)
# 4x faster with minimal accuracy loss

Cache results:

import joblib

# Save targets
joblib.dump((targets, meta), 'targets_cache.pkl')

# Load later
targets, meta = joblib.load('targets_cache.pkl')

How do I optimize indicator calculations?

Batch calculation:

# Calculate all indicators at once
df['SMA_20'] = df.rhoa.indicators.sma(20)
df['SMA_50'] = df.rhoa.indicators.sma(50)
df['RSI'] = df.rhoa.indicators.rsi(14)

# Store in DataFrame
df.to_pickle('data_with_indicators.pkl')

# Load when needed
df = pd.read_pickle('data_with_indicators.pkl')

Use appropriate window sizes:

# Smaller windows = faster
sma_5 = prices.rhoa.indicators.sma(5)    # Fast

# Larger windows = slower
sma_200 = prices.rhoa.indicators.sma(200)  # Slower

Avoid recalculation:

# SLOW - recalculates every time
for i in range(100):
    sma = df.rhoa.indicators.sma(20)
    # Use sma

# FAST - calculate once
sma = df.rhoa.indicators.sma(20)
for i in range(100):
    # Use sma

Data Issues

How do I handle missing data?

Check for missing values:

print(df.isnull().sum())
# Close    5
# High     3
# Low      2

Solutions:

# 1. Drop rows with missing values
df_clean = df.dropna()

# 2. Forward fill (use previous value)
df['Close'] = df['Close'].fillna(method='ffill')

# 3. Backward fill
df['Close'] = df['Close'].fillna(method='bfill')

# 4. Interpolate
df['Close'] = df['Close'].interpolate()

For financial data, forward fill is usually appropriate (use last known price).

Avoid:

# DON'T use mean/median for time series
df['Close'].fillna(df['Close'].mean())  # WRONG!
# This creates unrealistic prices

My OHLC data has inconsistencies

Check OHLC relationships:

# Verify data quality
assert (df['High'] >= df['Close']).all(), "High should be >= Close"
assert (df['Close'] >= df['Low']).all(), "Close should be >= Low"
assert (df['High'] >= df['Low']).all(), "High should be >= Low"
assert (df['High'] >= df['Open']).all(), "High should be >= Open"
assert (df['Open'] >= df['Low']).all(), "Open should be >= Low"

Fix inconsistencies:

# Fix: Ensure High is maximum
df['High'] = df[['Open', 'High', 'Low', 'Close']].max(axis=1)

# Fix: Ensure Low is minimum
df['Low'] = df[['Open', 'High', 'Low', 'Close']].min(axis=1)

Common causes:

Data provider errors
Stock splits not adjusted
Currency conversion issues
Bad data entry

How do I handle stock splits?

Use adjusted prices:

# Most data providers offer adjusted prices
# Yahoo Finance: Download with 'Adj Close'
# These already account for splits and dividends

import yfinance as yf
df = yf.download('AAPL', start='2020-01-01')
# Automatically adjusted for splits

Manual adjustment:

# If you have split information
split_date = '2024-06-10'
split_ratio = 4  # 4:1 split

# Adjust prices before split date
mask = df['Date'] < split_date
df.loc[mask, ['Open', 'High', 'Low', 'Close']] /= split_ratio
df.loc[mask, 'Volume'] *= split_ratio

Error Messages

“Index out of bounds” error

Likely cause: Not enough data for indicator window.

# Error
prices = pd.Series([100, 102])  # Only 2 values
sma_20 = prices.rhoa.indicators.sma(20)  # Need 20 values!

Solution: Ensure sufficient data.

# Check data length
print(f"Data points: {len(prices)}")
print(f"Window size: 20")

if len(prices) < 20:
    print("Not enough data!")

“Unable to parse string” in date column

Issue: Date column not in correct format.

# Error
df['Date'] = '2024-01-01'  # String, not datetime

Solution: Convert to datetime.

df['Date'] = pd.to_datetime(df['Date'])

# Specify format if needed
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

# For US format
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')

“KeyError: ‘Close’” or missing column error

Issue: Column name mismatch.

# Your data has 'close' (lowercase)
print(df.columns)
# ['Date', 'open', 'high', 'low', 'close']

Solution: Rename or specify column name.

# Option 1: Rename columns
df.columns = ['Date', 'Open', 'High', 'Low', 'Close']

# Option 2: Use lowercase
sma = df.rhoa.indicators.sma(20)

# Option 3: Specify in function call
targets, meta = generate_target_combinations(
    df,
    close_col='close',  # Specify your column name
    high_col='high'
)

Getting Help

Where can I find more examples?

Examples - Comprehensive examples
User Guide - Conceptual guides
API Reference - API reference
GitHub repository - Source code and examples

How do I report a bug?

Check if it’s a known issue: Search GitHub Issues

Create a minimal reproducible example:

import pandas as pd
import rhoa

# Minimal code that reproduces the bug
df = pd.DataFrame({
    'Close': [100, 102, 104, 106, 108]
})
result = df.rhoa.indicators.sma(20)  # Bug occurs here
print(result)

Include:
- Your Rhoa version: rhoa.__version__
- Your Python version: python --version
- Operating system
- Full error traceback
Submit: Create an issue on GitHub with this information

What features are planned?

Upcoming features (see roadmap):

More indicators (volume-based, pattern recognition)
Strategy backtesting framework
Preprocessing utilities
Data connectors (more exchanges/sources)
Performance optimization

Request features: Open a GitHub issue with the “enhancement” label.

Additional Resources

Documentation:

Quick Start - Get started quickly
Basic Concepts - Fundamental concepts
Indicators Guide - Indicator details
Targets Guide - Target generation guide
Visualization Guide - Visualization guide

Support:

GitHub Issues - Report bugs and request features
Examples - More code samples

Related Libraries:

pandas - Data manipulation
scikit-learn - Machine learning
matplotlib - Visualization
yfinance - Market data

Still Have Questions?

If your question isn’t answered here:

Search the documentation: Use the search bar at the top
Check GitHub Issues: See if it’s been reported or discussed
Open a GitHub Issue: Report bugs or ask questions

When reporting issues:

Describe what you’re trying to do
Show your code (minimal example)
Include any error messages
Mention Rhoa version and Python version

We’re here to help you succeed with Rhoa!