Spotting fake backtest track records — a practical skeptic's guide
Almost every paid crypto signal service publishes a backtest. Most of those backtests are fraudulent, intentionally or accidentally, in ways that are visible to anyone who knows where to look. This is a practical guide to the five most common cheats — survivorship bias, vectorized fills, look-ahead leak, overfitting, period cherry-picking — and how to detect them in a few minutes of inspection.
Why this matters
The crypto signal industry is roughly 95% noise and 5% real. The 95% includes outright scams, accidentally-fraudulent honest amateurs who don't understand what they're doing, and former-traders selling tools that worked once and don't anymore. None of those services are illegal in most jurisdictions; they're sold under the legal cover of "educational content" disclaimers.
The economic asymmetry is brutal. The seller risks zero capital — they collect subscription fees regardless of outcome. The buyer risks 100% of capital, applies the signals at retail size and retail timing, takes the actual losses. Every crypto signal service that shows a positive backtest and a growing subscriber count is benefiting from this asymmetry. Skepticism is not optional; it's risk management.
What follows is a checklist of the most common dishonesty patterns. If a backtest passes all of these, it might be real. If it fails any one of them, it's almost certainly worthless.
1. Survivorship bias
The cheat: the strategy is backtested on the universe of currently-listed coins, not the universe that existed at each historical point in time. Coins that have been delisted, exchanged-removed, or de-listed for terms violations are silently excluded. The losers literally don't exist in the data.
This is the single most common form of fake-backtest dishonesty in crypto, partly because honest data is genuinely difficult to source. Most exchange APIs don't return historical klines for delisted symbols. Building a survivorship-bias-free dataset is an engineering project that takes weeks. Most signal services skip it.
How to detect:
- Ask the seller for the trade list. Look for trades on coins like FTM, USTC, ANC, MIR, LUNC, CWAR, AGIX (ones that delisted, rebranded, or collapsed in the backtest period). If the trade list contains zero of these, the dataset has survivorship bias.
- Check the signal frequency over time. Real datasets have non-uniform frequency — periods of high activity, periods of quiet. Survivor-only datasets often look weirdly clean across years because the failed assets are missing.
- Look for "diversified small-cap" claims. Small caps are exactly where survivorship bias has the largest impact on results. A small-cap-focused strategy with no delisted coins in its dataset is essentially science fiction.
Internal note: our own track record had this problem in an earlier version. We rebuilt our dataset to include 90+ delisted symbols with their pre-delist klines. The reported return dropped meaningfully. That is the honest number.
2. Vectorized fills (no path dependency)
The cheat: the backtest computes returns by vectorized math — entry price minus exit price — without simulating the actual trade path tick-by-tick. This means stops and targets are never hit unrealistically; the math just picks the favorable price across the trade window.
Mechanically: if a strategy sets entry at $100, target at $110, stop at $95, and the price during the trade window goes $100 → $94 → $111, a vectorized fill records the trade as exit at $111 (the final or best price) — ignoring the fact that the stop at $95 would have been hit first. Mathematically, vectorized backtests cannot lose.
How to detect:
- Ask whether the simulator is path-dependent or vectorized. If the seller doesn't know the difference, that itself is the answer. If they say "vectorized," the backtest is probably 30-60% inflated relative to reality.
- Check the worst single-trade loss. If the strategy claims a -5% stop-loss but the trade list shows no losses larger than -4.5%, the simulator isn't realistically modeling slippage and gap risk. Real stops on volatile crypto routinely fill 0.5-2% below the stop level.
- Check the maximum drawdown vs win rate. A strategy with 65% win rate and 1:1 risk:reward should have drawdowns roughly proportional to the worst losing streak in N trades. If reported max DD is implausibly small (say, 3-4% with 200+ trades), the path dependency is broken.
We wrote a separate deep-dive on path-dependent simulation if you want the engineering detail.
3. Look-ahead bias / data snooping
The cheat: the backtest uses information that wasn't available at the trade time. The most common forms:
- Future-data leak: e.g., the strategy uses "the high of the day" or "the close of the bar" as a signal input — but it enters at the open of that same bar. The signal can't exist before the data is realized.
- Restated data: some funding rate or open interest data is restated by exchanges hours after the fact (corrections, late-arrival reconciliation). Backtests using the final restated number have access to information that wasn't visible at trade time.
- Composite indices: a strategy uses a "BTC-USD index price" that's the volume-weighted price across exchanges. If the index is computed in a way that includes future-realized volume weights, the indicator is partially future-aware.
- Survivorship in features: "we filter to coins with stable fundamentals" — fundamentals are evaluated as of today. A coin might have been considered fundamentally weak at the trade time but is now seen as stable in retrospect. The filter encodes future information.
How to detect:
- Ask precisely which fields the strategy reads at signal time. If the answer mentions any feature whose value is only defined after the bar closes (close, high, low, volume, funding for the period that contains the signal), look-ahead is in scope.
- Look for strategies that win consistently in periods of major market regime change. Real strategies have regime-dependent performance. Strategies with suspiciously stable returns across bear, bull, and chop markets often have look-ahead leak smoothing the curve.
- Ask whether the backtest uses live-stream data or historical klines for entries. Live streams can't have look-ahead; historical klines often contain restated values silently.
4. Parameter overfitting
The cheat: the strategy was optimized on the same data it's reporting performance on. The parameters are tuned to fit the historical noise, and in-sample results look great. Out-of-sample, the noise is different and the strategy fails.
The amount of overfitting is roughly proportional to the number of parameters times the granularity of the search times the lack of out-of-sample validation. Crypto strategies often have 5-10 parameters, each with 10-20 candidate values, optimized via grid search on the same period being reported. That's 10^7 to 10^10 parameter combinations searched against a single sample of history. The likelihood of finding a fake-positive set is extremely high.
How to detect:
- Ask for the train/test split. A real backtest uses a portion of history (say 70%) to design and tune, then evaluates on the held-out 30%. Out-of-sample numbers will be visibly worse than in-sample. If the seller doesn't have an out-of-sample number, the entire backtest is in-sample and overfit.
- Ask how many parameter combinations were tried. If more than 100, the multiple-comparisons problem is severe even with good methodology. If more than 10,000, the result is essentially noise no matter what.
- Ask for live results. Out-of-sample is the simplest possible test: did the strategy keep working after the backtest period ended? A signal service with three months of live results that match its backtest is meaningfully more credible than one with five years of backtest and zero live data.
5. Period cherry-picking
The cheat: the backtest period is selected to span only times when the strategy worked. Periods of underperformance are excluded with various rationales — "the asset wasn't liquid yet," "the regime was different," "we changed the rules and the old version doesn't apply."
How to detect:
- Ask for full-history performance. If the strategy was designed in 2024, ask what its returns were 2020-2023. If there's a "the strategy didn't exist then" excuse, ask about a similar strategy or the underlying logic. Real edges generalize across time; gimmick edges don't.
- Look for unusual start dates. Backtests starting in November 2022 (post-FTX bottom) capture only the bullish recovery. Backtests starting in January 2024 capture only the bull run. Period selection is the most innocent-looking dishonesty pattern.
- Ask for a year-by-year breakdown. Strategies with positive return in every year are extremely rare and statistically suspicious. Real strategies have at least one losing year in any 5-year window.
What good track records actually look like
Honest crypto backtests share several properties:
- Lower numbers than the fakes. A real strategy targeting market-neutral exposure rarely produces more than 30-100% APY. A real directional strategy targeting alpha rarely produces more than 50-200% APY consistently. Anything claiming 500%+ over multi-year periods is almost certainly hiding something.
- Visible drawdowns. Every honest equity curve has 10-25% drawdowns at minimum. Smooth curves are red flags.
- Rough periods clearly disclosed. "We had a -18% drawdown in May 2024 due to the unwind in altcoin liquidity" is what you want to see. "Steady gains across all market conditions" is fiction.
- Methodology transparency. Path-dependent simulation with explicit slippage and fee assumptions, survivorship-bias-free dataset, train/test split, live-vs-backtest comparison. If the seller can describe these in detail, they probably did the work. If they hand-wave, they didn't.
- Live evidence. The strongest signal is a public live track record that started before the backtest was published. If live performance roughly matches backtest performance over 6+ months, the methodology is probably sound. If live performance is meaningfully worse, the backtest had a bug or a cheat.
The honest checklist for any track record
Before subscribing to or copy-trading any service, ask:
- Is the dataset survivorship-bias-free? Can you point to delisted/failed coins in the trade list?
- Is the simulator path-dependent? Are stops modeled with realistic slippage?
- What features are used as signal inputs? Are any of them only defined after the trade bar closes?
- What's the in-sample vs out-of-sample performance? How many parameter combinations were searched?
- What's the year-by-year breakdown? Is there a losing year? What was the worst drawdown?
- Is there a live, third-party-auditable track record that started before the backtest was published?
If the seller can answer all six honestly, the track record is probably real. If they can't answer most of them, or get evasive, the track record is probably fake.
Where this leaves you
Probably skeptical of about 95% of advertised crypto strategies. That's the correct level of skepticism. The base rate of "advertised crypto strategy that actually works after costs" is in the low single-digit percent. Most working strategies are not advertised at all because they're being run by funds or individual quants who don't need subscription revenue.
The signal services that are credible tend to disclose their flaws. They publish drawdowns. They explain what didn't work. They show out-of-sample comparisons even when those comparisons are unflattering. Honest discomfort is the marker of real work.
If you're already paying for a service, run this checklist on it. If most boxes don't tick, your money is funding noise. If you're considering one, run the checklist before you subscribe. The checklist takes 10-15 minutes. The cost of skipping it is whatever you pay over the next 6 months.
Run by traders who've published the autopsies
We're a small EU quant team. We trade live, post our research, and document what didn't work. See the algorithms · our backtest methodology · free email courses.