After 24 days of losing trades, I'm testing a new signal source. Here's the setup.

TL;DR

For 24 days I ran a 30-pair crypto perp scanner using classical technical analysis (RSI, EMA, VWAP, ADX) on 15-minute candles. About 4,700 paper trades. Win rate 25.8% over the last 8 days. That's not bad luck. Statistically the signal is dead. Starting next week I'm testing whether order flow features (taker buy ratio, cumulative delta) on 5 high-cap pairs can do better. 14-day public test. I'll publish the verdict around May 22 regardless of how it goes.

What I tried first, and why it failed

The original setup was the most popular retail crypto strategy on the internet: 15-minute candles, 30 alt-heavy pairs (including memes like PEPE, WIF, JUP), classical TA indicators, fixed take-profit and stop-loss. Six different strategy variants firing across the universe.

After 8 days of careful logging, the results:

361 closed trades
25.8% win rate
About $90 lost per day at small position sizes

I ran the math on whether 25.8% over 361 trades could happen by random luck if the true win rate were 50%. The z-score is roughly minus 9.6. Translation: the probability of getting this poorly with a real edge is essentially zero. The signal genuinely does not have an edge.

That should not be a surprise. Classical TA on 15-minute crypto is the most-traded, most-arbitraged, most-written-about setup in retail. Of course there is no free money there.

The interesting question is what to test next.

The hypothesis

The system already computes order flow features for an unrelated machine learning model: taker buy ratio (the share of trades that hit the ask vs. the bid), cumulative delta (running net buying minus selling pressure), and trade count z-score (whether activity is unusually high). These get fed to a quantile regression model that predicts price distribution.

Here is the strange part. The model uses these features for its tail predictions, but the actual entry decisions never see them. The scanner that triggers buys and sells is still classical TA only.

So the test is straightforward. Build a new scanner that uses order flow as the primary signal. Run it on a smaller, cleaner universe: BTC, ETH, SOL, BNB, XRP. Same 15-minute timeframe. ATR-based dynamic exits instead of fixed targets. Two weeks of paper trading, then verdict.

If order flow has edge that classical TA missed, this should beat 50% win rate convincingly. If not, I have my answer and move to the next hypothesis.

How I'll measure

I committed to fixed pass criteria upfront, before any results come in. This is to prevent the trap of moving the goalposts when reality looks ugly.

Pass requires all five:

At least 20 closed trades in the window
Win rate at least 55%
Expected value per trade at least 0.10R (risk-adjusted)
No symbol with a 5R drawdown
No calendar week worse than minus 5%

If all five hit, I might consider it for real money in a small ramp. If even one fails, the hypothesis is dead and I move on. There is a kill rule too: if win rate drops below 30% with at least 10 trades, I stop early.

Two other hypotheses are running in parallel during the same window. One tests funding rate mean reversion on the top 10 perps by open interest at 8-hour cycles. Another tests classical TA but on the 1-hour timeframe instead of 15-minute. Each gets its own paper book, its own setup post, and its own verdict.

Why this might fail

Order flow is not new. Algorithmic market makers have been reading taker buy pressure for years. If the obvious institutional signal were tradeable for retail at 15-minute resolution, it would have been arbitraged out long ago.

There is also a calibration problem. The quantile model that uses these features achieves 52.3% directional accuracy. That is barely above a coin flip. Net of fees and spread, that is no edge. So the bet here is that using the raw features as a hard entry filter (rather than feeding them through a probabilistic model) extracts something the model smoothed out.

Plausible. Not obvious.

If the test fails, the most likely culprits in order:

The signal is too crowded for retail at this timeframe.
The 5-major universe is too correlated and dumps together.
The ATR exits are wrong for this signal type.
The whole premise of finding short-term crypto edge with public-API data is flawed.

The third hypothesis (1-hour classical TA) partially addresses (1) and (3) by changing the timeframe and exit. The funding rate hypothesis tests a fundamentally different mechanism. If all three fail, that's strong evidence I need to look outside crypto perps entirely.

What happens next

May 9 to 23: paper trading on 5 majors with the new signal. Daily I'll glance at the trade log but won't tune anything mid-run. The eval criteria are locked.

May 16: midpoint update. If the run is clearly going one way, I'll write what I'm seeing.

May 23: verdict post. Win rate, expected value, drawdowns, kill conditions, what I learned. Honest numbers, no spin. If it failed badly, that is part of the story.

Whether this works or not, the audit itself is worth doing. Most retail trading content shows the wins. I want to show the full ledger, including the dead ends.

Come along for the ride. See me fall or thrive, whichever comes first.

Code, eval criteria, and full spec for this hypothesis are in the companion engineering doc. Updates will land here, on Dev.to, and Hashnode. X thread version will follow.