Most crypto traders backtest by running a strategy on a chart, seeing green results, and going live. This is not backtesting. This is confirmation bias with a chart attached. Real backtesting is a multi-stage process designed to ruthlessly eliminate strategies that look good but will fail in production.
We have run over 10,000 backtests across 40 strategies and 25 symbols. More than half of those strategies showed positive results in initial screening and failed when properly validated. The five-stage pipeline we use exists to catch these false positives before they reach real capital.
Stage 1: Data Quality
Before you test anything, verify your data. Our 20.8 million candles across 25 symbols and 11 timeframes go through automated quality checks: timestamp continuity (no missing bars beyond expected market closures), OHLC validity (high must be the highest, low the lowest, open and close within range), and deduplication (INSERT OR IGNORE prevents duplicate rows).
Gap detection flags periods where candles are missing. Our integrity module uses a 1.5x tolerance on expected bar intervals — if the gap between two consecutive candles exceeds 1.5 times the timeframe duration, it is flagged. Quality scoring produces a 0 to 100 score per symbol-timeframe combination, and we only backtest on data scoring above 95.
Bad data produces bad backtests. A missing candle during a crash can make a strategy appear to avoid the crash entirely, inflating results. A duplicate candle can create false indicator readings. Data quality is the foundation that everything else depends on.
Stage 2: Realistic Execution Modeling
The backtest engine must model execution realistically. Our engine applies taker fees of 0.10 percent on both entry and exit (matching Binance spot rates), slippage of 2 to 10 basis points per fill (buys slip up, sells slip down), and position sizing against current equity (not initial capital).
Stop-loss and take-profit are checked against candle high and low, not candle close. If the stop-loss level falls within a candle's range, the stop is assumed to trigger. If both stop-loss and take-profit are within the same candle's range, the stop-loss triggers first (conservative assumption).
No-lookahead enforcement ensures the strategy sees only candles up to but not including the current bar. Multi-timeframe data is sliced by timestamp so a 15-minute strategy cannot see a 4-hour bar that has not yet closed.
Stage 3: Tournament Screening
The tournament runs every strategy with default parameters across all symbols. This is a screening test that answers one question: does this strategy show any signal at all? Strategies that produce negative returns with default parameters on every symbol are eliminated immediately.
The tournament is intentionally low-bar. Most strategies pass because default parameters are chosen to be reasonable. The value is in identifying complete non-starters before investing optimization effort. Of our 40 strategies, approximately 35 passed the tournament with at least one positive symbol.
Stage 4: Parameter Sweep and Refinement
The sweep tests hundreds of parameter combinations per strategy across all viable symbols. For our Bollinger Band strategy, this meant 288 combinations of bb_period, bb_std, and min_confidence across 13 symbols — over 3,700 individual backtests in Phase 1 alone.
Phase 2 refines the winners. A plus or minus 20 percent grid around Phase 1 winners tests the neighborhood. This reveals parameter sensitivity: if small changes produce dramatically different results, the optimum is fragile and likely overfitted.
Phase 3 runs a tight grid on validated symbols only, providing the final parameter calibration for deployment.
The sweep answers: what are the best parameters, and how robust is the performance landscape around them? Broad, smooth plateaus indicate genuine edge. Narrow spikes indicate noise.
Stage 5: Regime Validation
The most important stage. Every strategy is tested across five distinct one-year market periods: 2021-2022 (bull to crash), 2022-2023 (bear to recovery), 2023-2024 (recovery to highs), 2024-2025 (consolidation), 2025-2026 (recent).
A period counts as a win if the strategy produces both positive returns and Sharpe above 1.0. A symbol needs three or more wins out of five for ROBUST. The overall strategy needs four or more profitable periods for PROCEED.
This stage killed more than half of our strategies that passed the sweep with positive Sharpe ratios. The wavelet decomposition strategy had a sweep Sharpe of 3.19 and failed validation. The OU mean reversion had 1.98 and was negative everywhere. Five statistical strategies all failed. Ichimoku and trend alignment both failed. Funding contrarian failed.
The strategies that passed — mean reversion, momentum, leverage composite, correlation regime, NUPL cycle, stablecoin supply — showed consistent performance across fundamentally different market environments. That consistency is what separates a real edge from a fitted curve.
What Most Backtests Get Wrong
The most common mistake is testing and optimizing on the same data. If you optimize bb_period on the full 2021-2026 dataset and then report the Sharpe on that same dataset, you are measuring in-sample performance. Walk-forward analysis fixes this by optimizing on rolling training windows and testing on subsequent out-of-sample windows.
The second mistake is not modeling fees and slippage. A strategy that makes 0.1 percent per trade before fees might lose 0.05 percent after 0.10 percent taker fees on each side plus slippage. Many marginal strategies are negative after realistic execution costs.
The third mistake is not testing across market regimes. A strategy optimized on bull market data will fail in a bear market. A strategy optimized on ranging data will fail during trends. Only multi-regime validation catches this.
The fourth mistake is trusting Sharpe ratios from parameter sweeps. The best result from 288 combinations will have a high Sharpe ratio even on random data because you are selecting the maximum from a large sample. Only validated Sharpe (tested out-of-sample across regimes) should inform deployment decisions.
The Decision Framework
After all five stages, the decision framework is straightforward. Does the strategy have a structural explanation for its edge? Is the parameter landscape smooth and broad? Does it earn ROBUST verdicts on multiple symbols? Is the Monte Carlo probability of profit above 85 percent? Is the 95th percentile drawdown within risk limits?
If all answers are yes, the strategy enters paper trading for live observation. If any answer is no, the strategy is either refined further or archived. We have archived more strategies than we have deployed, and that ratio is the sign of a disciplined process.