Every hypothesis we killed.
And what we learned.

Rigor is the moat. Most quant shops show only winners — we show the math, the backtests, and the verdicts on strategies that didn't pass validation. Live strategies remain undisclosed.

18
Hypotheses tested
11
Killed outright
3
Parked (insufficient data)
4
Under research / partial
KILLED · no edge

After volume-confirmed dump events, mid-frequency reflexive bounce trades on 30m/1h/2h/4h timeframes carry sufficient edge for systematic deployment with tight SL/TP.

Deep multi-timeframe path-dependent backtest on 90 days of 5min klines across 78 symbols (T1+T2+T3). Realistic filters: BTC regime gate, time-of-day, 60min cooldown. Per-tier breakdown.

30min TF (n=421)WR 38-42%, mean −0.10 to +0.01%
1h TF — best variant (n=479)WR 40-44%, mean −0.00 to +0.06%, monthly +1.0%
2h TF (n=695)WR 34-40%, mean −0.13 to −0.21%
4h TF (n=1089)WR 32-40%, mean −0.19 to −0.27%
Pump filter overlay (skip pumped >30%/24h)no improvement
Best Sharpe across all configs0.03

Initial test on n=100 random sample claimed +16.8%/mo. Bootstrap CI on that sample was ±0.70%, mean +0.41% — statistically indistinguishable from zero. Multiplying noise by 30× trade frequency compounded into a fake-positive projection.

$$ \text{SE}(\bar{r}_{\text{sample}}) = \frac{\sigma}{\sqrt{n}} = \frac{6.5\%}{\sqrt{479}} \approx 0.30\% \quad \implies \quad \bar{r} \pm 0.60\% \text{ at 95\% CI} $$

Mean falls inside zero band. No statistically significant edge.

KILLED
Killed before paper validation. Modules disabled via env, code archived but not running. No live capital deployed.
Never claim edge from a single small sample. Confidence intervals on monthly projections require n ≥ 300 with proper out-of-sample. Compounding noise gives the illusion of signal.
KILLED · path bias

On 1473 historical dump events (≥6% drop in 60min with ≥3× volume burst), 87.9% saw price recover above pre-dump levels within 4h. Capitulation lows should be tradeable LONG with mean +7% return.

Define event as bar window [t₀, t₁] with (p_{t_1} - p_{t_0})/p_{t_0} ≤ θ_{drop} and V_{t_1-t_0} ≥ k · V̄_{prior 6h}, where tier-specific θ ∈ {-3%, -4%, -6%, -10%} and k ∈ {2.0, 2.5, 3.0, 4.0}.

Expected return from event end t₁ forward 4h:

$$ \mathbb{E}[r_{4h} \mid \text{dump}] = +7.0\% \pm 9.2\% \text{ over n=1473} $$

True path-dependent simulation on 1m Binance futures klines for each event. Compared against naive close-only assumption.

Naive close-only sim (SL=10/TP=15)+1608% / 90d, Sharpe 17
Path-dependent sim (1m klines)−1494% / 90d
Events with price < peakC after entry1473 / 1473 (100%)
Mean further drawdown after dump−5.98%
SL=5% trigger rate61.3% of trades
SL=10% trigger rate22.3%
KILLED
The 87.9% bounce statistic is an artifact of measuring at 4h close. In reality, price first dips further (avg −6%) before bouncing. Stops trigger before recovery. Adding any realistic SL destroys the edge. The bounce is real but untradeable for retail.
Aggregate statistics ≠ tradeable edge. Path dependency matters more than terminal distributions. Backtest entry, exit, AND every bar in between.
KILLED · sample bias

Borrowing from Perelman's reduced-length functional from Ricci flow theory, "smooth" price approaches to a signal trigger (low geometric path energy) should correlate with higher follow-through than chaotic approaches.

$$ L(\gamma) = \int_{0}^{\bar{\tau}} \sqrt{\tau}\,\left(R + |\dot{\gamma}|^{2}\right)\,d\tau $$

Discretized for log-price path on 240×1m bars before signal trigger:

$$ L_{\text{price}} = \sum_{k=1}^{N} \sqrt{\tau_k} \cdot \left( \sigma^2_{k} + r_{k}^{2} \right) \cdot \Delta\tau $$

Where τ_k = k/N is normalized time position, σ²_k is local realized variance in 10-bar window centered on k, and r_k is the log-return at bar k.

Q1 (lowest L) WR on pump-only sample76.1%, avg +4.37%
Q5 (highest L) WR on pump-only sample52.1%, avg +2.53%
Walk-forward TEST > TRAIN (suspicious)+5.52% vs +4.67%
DD reduction vs no-filter−72%
RAVEN filter (pre-condition) pass rate0.50% (25/5000)
Of those, true positives24%, not 49% as CV claimed
GEOFLOW filter applied on top100% removed TPs, kept all FPs
Final P&L−$2.24 / $100 over 12 trades
KILLED
L-distance discriminates within pump-only labeled set, but the relationship inverts on the real-world distribution. FPs in the smart-money universe have lower L than TPs. The earlier "Sharpe 18" was a sample-bias artifact. Curated samples deceive.
A filter that works on labeled positives may not generalize. Always validate on the realistic operating distribution, including false positives.
PARTIAL · weak real edge

The internal scoring function of a legacy strategy (V14 QUANT, stored in 14.6M signalsnapshots) may be overfit. Higher quality scores may not correlate positively with realized P&L.

Random sample 5000 snapshots. For each: fetch forward 1m klines, simulate entry at the snapshot's own entryTop/Bot zone, walk forward bar-by-bar checking SL/TP/timeout. Compute realized pnlPct per trade. Then rank-correlate features vs outcome.

Baseline WR53.1%
Baseline avgPnl per trade (gross)+0.305%
Low quality (Q1, < threshold)WR 60.1%, avg +0.521%
High quality (Q5)WR 50.6%, avg +0.273%
timing="MISSED" subsetWR 63.5%, avg +1.048%
Walk-forward decay TRAIN→TEST−18% per month → −5% per month
$$ \text{Corr}(\text{quality}, r_{\text{realized}}) = -0.12 \quad (p < 0.01) $$
PARTIAL EDGE
Real but small. Inverting the strategy's own scoring captures a +3-6%/month edge at 5× leverage. Strong decay TRAIN→TEST suggests the edge is regime-dependent. Worth deploying as overlay filter, not standalone.
Even "your own" scoring system can be wrong. Test the inverse. Overfit detection often hides in plain sight.
PLANNED · implementation

Following Sornette's Log-Periodic Power Law model, financial bubbles exhibit faster-than-exponential growth with log-periodic oscillations preceding a critical time tc. We test whether crypto's accelerated bubble cycles (days–weeks rather than months) yield tractable tc predictions on BTC, ETH, and top alts.

$$ \ln p(t) = A + B(t_c - t)^{m} + C(t_c - t)^{m} \cos\left(\omega \ln(t_c - t) - \phi\right) $$

Parameter ranges from Sornette's published constraints:

  1. Rolling LPPL fit on 1h log-prices, window 30–90 days, refit every hour
  2. Validity gates: 0.1<m<0.9, 6<ω<13, R²>0.9, tc within 1–14 days forward
  3. Parameter stability check across multiple window shifts
  4. Multi-asset confluence (3+ synchronous bubble signals = strong)
  5. Entry: SHORT setup 2–5 days before tc
  6. Exit: at tc or on LPPL fit breakdown
PLANNED
Implementation queued. Edge probability estimated 30–50% based on Sornette's published track record. Crypto-specific high-frequency LPPL with multi-asset confluence has no public retail-grade implementation we are aware of.
PARKED · n=3

Binance MONITORING-tag announcements (volatility-warning labels) are preceded by detectable volume + range anomalies in the 6h window before the public announcement. If true, a pre-event SHORT detector could capture the post-announcement drop before the crowd reads the news.

Phase 0: scrape Binance CMS catalog (catalogId=49). 20 pages × 50 articles. For each: identify coin, check if it has Binance USDT-M futures listing, fetch 1m klines spanning [T-6h, T+1h] around announcement. Compute pre-event drift, volume ratio, range expansion.

Articles fetched16 MONITORING tags
With futures-tradable coin4
With sufficient kline history3
Pre-event drift direction3/3 DOWN (−2.3%, −13.0%, −19.0%)
Sample size for inferencen=3 — insufficient

100% downward drift in pre-event window is suggestive but n=3 is noise. Bootstrap 95% CI on n=3 spans entire ±50% range.

PARKED
Hypothesis remains UNTESTED, not rejected. Need n ≥ 30 via archive.org scraping or spot-price analysis. Re-attempt when broader historical data sourced.
Promising directional signal (100% down-drift) at n=3 means nothing. Statistical significance starts at n ≥ 30 for non-parametric tests, n ≥ 100 for any robust claim.
KILLED · execution mismatch

Top-PnL Hyperliquid traders publish every position on-chain (HL's order book is fully transparent). Mirroring their entries to Binance USDT-M futures should capture a fraction of their edge — particularly on majors (BTC, ETH, SOL) where slippage is minimal.

Identify top-20 wallets by 30-day PnL on HL. WebSocket listen to their position changes. When a tracked wallet opens a position above $100K notional, fire a mirroring order on Binance same coin, same side.

  1. Latency mismatch: HL whale fills are typically maker-limit at deep liquidity. Any follower with public-API latency enters 0.3-1.5% worse than the originator on average.
  2. Position duration mismatch: Whales hold hours-to-days with average DCA-in over multiple fills. Our single-shot mirror catches only the entry tip, exits poorly.
  3. Wallet attribution: Same person operates multiple wallets. Composite NET exposure ≠ individual wallet signal.
  4. Selection bias: "Top 30-day PnL" includes survivorship + recency bias. Wallets fall off the leaderboard the moment they have a drawdown.
  5. Reverse-MEV risk: When public HL whale wallets are watched, they sometimes intentionally fake entries to trap copy-traders.
KILLED
Tested in earlier sessions, failed. Hard rule established: never re-propose HL wallet-copy strategies regardless of who suggests them. Architecture flaw, not parameter flaw.
Transparent doesn't mean tradeable. Public information that requires faster execution than the originator already has zero edge for slower followers.
KILLED · adverse selection

Binance DELISTING announcements (catalogId=48) trigger immediate panic dumps. Auto-shorting the announced coin within seconds of announcement should capture −15 to −40% over 24-72h.

MONITORING tags (catalogId=49, BURST-class strategies) are different — they happen pre-announcement with intact liquidity. DELISTING auto-short specifically refers to acting AFTER public announcement, which has structural reasons to fail.

KILLED
Hard rule: never auto-short on DELISTING announcements. MONITORING (different category) remains a valid signal source under different mechanics.
Reactive shorts on widely-broadcast events are race-conditioned against the entire market. Edge requires either earlier information or different mechanics (pre-announcement detection like MONITORING 2.0).
KILLED · −8.94%/7d live

Aggregating signal feeds from multiple "smart money" sources (HL alerts, on-chain whale flows, exchange leaderboards) into a unified scoring system should produce above-baseline performance through diversification of signal sources.

Period7 days live, Q1 2026
P&L−8.94%
Win-rate~38%
Average loss size2.4× average win size
KILLED
−8.94% in 7 live days. Killed and replaced. Lesson reinforced for all future copy-style approaches.
Signal sources presented as "independent" rarely are. Test correlation between source returns before aggregating. Diversification doesn't help when underlying signals share hidden common factors.
BLOCKED · data dep

After a forced-liquidation cascade ($1M+ in 5min on a single coin), forced sellers are exhausted and price overshoots fair value. Entry on first reversal candle with tight 1-3% SL captures the mean-reversion bounce with high win-rate.

$$ \text{Cascade}(t) = \mathbb{1}\left[ \sum_{\tau \in [t-5\min, t]} L_{\text{long}}(\tau) > \$1\text{M} \right] $$

Entry trigger: Cascade(t) = 1 AND r5m(t) ≤ −3% AND volume spike ≥ 2σ. SL at cascade-low minus 0.5% buffer. TP at Fibonacci 38-61% retracement of cascade move.

BLOCKED
Mathematically blocked by infrastructure tier. Resume on Coinglass paid plan activation. Target: 70%+ WR with SL 1-3% on liquid alts.
Strategy viability is data-resolution-bound. Some edges require sub-minute liquidation feeds that public APIs don't expose. Pay for the data or build a different strategy.
PARKED · infrastructure gap

Sub-second mean reversion on tick data is profitable for sufficiently fast operators.

Pivoting to mid-frequency (1–5min hold) as accessible alternative for the broader retail/prop universe.

PARKED
Real edge exists in this space but is structurally unreachable without enterprise-tier infrastructure. Not killed; revisitable if scale and infra justify the investment.

Want this rigor on your strategy?

We backtest, validate, and kill strategies systematically — until something survives. If you have an algorithm idea, or want access to our live signals, start here.