The 5-Stage Pipeline That Kills Overfitting

A backtest with a Sharpe ratio of 5 means nothing. Without proper validation, that number is almost certainly overfitted to historical noise. We learned this the hard way and built a 5-stage pipeline to prevent it.

Stage 1: Tournament Screening

The first stage answers one question: which strategies show any signal at all? We run every strategy with default parameters across all symbols and rank by Sharpe ratio. This is a coarse filter. Most strategies fail here, producing negative or near-zero Sharpe on most assets. That is expected and useful. It narrows the field from 40 strategies to the 5 or 6 worth investigating further.

Stage 2: Parameter Sweep

For each surviving strategy-symbol pair, we run hundreds of parameter combinations in a grid search. This is where you find the optimal configuration, but it is also where overfitting risk is highest. A sweep might test 500 parameter combos and find one with Sharpe 15. Is that real or noise?

The answer comes from the next stages. The sweep gives you candidates. Validation tells you which candidates are legitimate.

Stage 3: Walk-Forward Validation

Walk-forward testing splits history into rolling windows. You optimize on window N, then test on window N+1. If the optimized parameters work out-of-sample, the edge is likely real. If performance collapses, you were curve-fitting.

We use overlapping windows with a 70/30 train-test split, rolling forward in monthly increments. A strategy must maintain positive Sharpe in at least 60 percent of out-of-sample windows to pass.

Stage 4: Monte Carlo Simulation

Even after walk-forward, you might be lucky. Monte Carlo simulation shuffles the order of your trades thousands of times to build confidence intervals. If your strategy shows Sharpe 3 but the 5th percentile Monte Carlo outcome is Sharpe -1, your edge is fragile and sequence-dependent.

We run 10,000 simulations per strategy and require the 5th percentile Sharpe to remain positive. This filters out strategies that depend on a specific sequence of wins and losses that may not repeat.

Stage 5: Regime Testing

The final gate tests across five distinct market regimes from 2021 to 2026: the 2021 bull run, the 2022 crash, the 2023 recovery, the 2024 consolidation, and the 2025-2026 recent period. A strategy must be profitable in at least 4 of 5 regimes to earn a ROBUST verdict.

This is the hardest filter. Strategies that looked invincible in sweeps often fail in 2 or 3 regimes. Our statistical strategies (Ornstein-Uhlenbeck, Kalman filter) were completely destroyed at this stage despite sweep Sharpe ratios above 2. The lesson: sweep Sharpe is not a predictor of live performance.

The Result

Of 40 strategies tested, only a handful survive all five stages with a ROBUST verdict on meaningful symbol sets. That is the point. The pipeline is designed to be harsh so that what survives is genuinely deployable.

The 5-Stage Pipeline That Kills Overfitting

Stage 1: Tournament Screening

Stage 2: Parameter Sweep

Stage 3: Walk-Forward Validation

Stage 4: Monte Carlo Simulation

Stage 5: Regime Testing

The Result

Related Posts

How to Backtest a Crypto Strategy: The Complete Guide

TradingView Pine Script vs Python Backtesting

Sweep Sharpe Is Not Real Sharpe: The Optimization Trap