📚 Backtests — three iterations, one foundation

Three 20-game samples. Two failed to beat blind betting. One showed promise on small sample. The point isn't to celebrate the wins — it's to know honestly where the edge is and isn't.
Jump: Concept · Side-by-side · Charts · BT1 NBA v1 · BT2 MLB v1 · BT3 NBA v2 · BT4 NBA reg v2 · Meta-analysis · Reg vs playoffs · Real edges · 3 tests for real edge · Discipline > volume · Strategy v3 · Meta-meta · Limitations · Path to proven · Actions · Honest answer · Dual-tier review · Top 5 longshots

1. Concept

Sports betting backtests — learning from real data.

Three 20-game samples tested with progressively refined strategies. Real outcomes. Honest scoring. What's actually working vs what just feels smart.

The point: most bettors think they have edge they don't have. We test that explicitly so we don't fool ourselves.

2. Side-by-side summary

IterationSportGamesStrategyRecordROIvs BlindBeat Blind?
BT1NBA Playoffs20v1 · Line + Talent16-4+39.4%+43.5%NO (-4.1pp)
BT2MLB Reg Season20v1 · Line + Talent7-6-11.7%-11.5%NO (-0.2pp)
BT3NBA Playoffs13 bet · 7 skipv2 · Skip Discipline12-1+59.0%+43.5%YES (+15.5pp)
BT4NBA Reg Season8 bet · 0 skipv2 · Skip Discipline7-1+46.1%~+15%*YES (small sample)

* BT4 is an 8-game Pistons-focused sample — half a normal test. Direction is clear; magnitude is noisy. Treat as suggestive, not conclusive.

3. Visualizations

ROI by iteration vs blind favorite-betting benchmark

Blind favorites returned +43.5% on the NBA sample and -11.5% on the MLB sample. Strategy v1 underperformed both. Strategy v2 (applied retroactively to the NBA data) beat the blind benchmark by 15.5pp — but on 13 bets, this could be variance.

Skip discipline impact — v1 vs v2 on same NBA data

Same underlying analysis. v2 refused to bet 7 of the 20 games (coin flips + heavy juice). Result: 92% hit rate on the 13 it did bet, vs 80% for v1's "bet everything" approach. Less volume, higher quality.

Confidence calibration — STRONG vs MEDIUM bet performance (MLB sample)

When the model said "STRONG" (heavy juice favorites, $8 stake), it lost money. When it said "MEDIUM" ($5 stake), it made money. Counterintuitive but consistent — confidence is mis-calibrated. Fix: flat stakes until calibration is proven.

4. Backtest 1 — NBA Playoffs (May 3-13), Strategy v1

BT1 · NBA Playoffs · "Line + Talent" (v1)

MISS
Record
16-4 (80%)
Staked
$100
Returned
$139.40
ROI
+39.4%
vs Blind
-4.1pp

Setup: 20 NBA playoff games. v1 = pick favorites in talent-mismatch matchups, use narrative factors (home court, momentum, must-win). $5/$8 stake based on confidence.

Benchmark: Blind favorites went 15-3 for +43.5% ROI. My analysis lost the matchup by 4.1pp.

What worked

  • Heavy favorites in talent-mismatch series (OKC vs LAL)
  • Road favorites in dominant series
  • Bounce-back games at home after surprising losses
  • G7 home court advantage

What didn't work

  • Pivotal G5 home court narrative (lost 17%)
  • Home underdogs based on home court alone
  • Road favorites against desperate home underdogs
  • "Pivotal" framing already priced in

The 4 losses — pattern

  1. SAS vs MIN G1 — trusted rest over momentum
  2. MIN vs SAS G3 — trusted home underdog over talent
  3. CLE vs DET G3 — underestimated desperation factor
  4. DET vs CLE G5 — trusted "pivotal home" already priced in

Key insight

My analysis matched what the line was telling me (favorites won most). My losses came from overriding the line with narrative factors. Blind favorite-betting beat me because it had no narrative bias.

Factor accuracy scores (small sample): home_court_g7 100% (2/2), 1_seed_dominant 100% (2/2), rest_advantage_after_sweep 100% (1/1), bounce_back_at_home 100% (1/1) · ✗ home_court_pivotal_g5 50% (1/2 — actively hurt), ✗ home_court_for_underdog 0% (0/1).

Full breakdown: /bets/backtest-may-2026/ →

5. Backtest 2 — MLB Regular Season (May 12-14), Strategy v1

BT2 · MLB · "Line + Talent" (v1, same strategy)

FAIL
Bet / skipped
13 / 7
Record
7-6 (54%)
Staked
$89
Returned
$78.60
ROI
-11.7%

Setup: Same v1 strategy applied to 20 MLB regular-season games. First attempt at skip discipline (7 coin flips skipped). $5/$8 stakes.

Benchmark: Blind favorites went 10-10 for -11.5% ROI. Both lost money. The vig is brutal in MLB without real edge.

Lessons learned

  1. MLB ≠ NBA. Same strategy "worked" in NBA, failed in MLB. NBA favorites win 65-70%, MLB favorites 55-58%. Vig kills you below 56% hit rate.
  2. Confidence calibration is BACKWARDS. STRONG bets ($8): 4-4, -19.4% ROI. MEDIUM bets ($5): 3-2, +8.0% ROI. When most certain, most wrong.
  3. MLB requires daily pitcher data I don't have. All 4 STRONG losses had pitcher mismatches I couldn't see. LAD-SF, NYY-BAL, PIT-COL, TOR-TB all lost on pitching.
  4. Skip discipline worked partially. Skipped 7 games. 5 of 7 would have been losses or coin flips. Only 1 missed obvious win. Skip rate should probably be HIGHER (50%+).
  5. Bounce-back factor showed in MLB too. LAD won big after G1 upset. PIT won big after G1 upset. 2/2 — small sample, pattern emerging.

Full breakdown: /bets/backtest-mlb-may-2026/ →

6. Backtest 3 — Strategy v2 on original NBA data

Honest caveat — not a fresh sample

"Backtest 3" isn't new games. It's the same NBA data from BT1 with Strategy v2 applied retroactively. It tests whether the refined strategy would have done better on data we already analyzed.

It's not a true out-of-sample test (which would require new games we haven't seen). But it's the most honest comparison we can do without waiting for more games to happen.

BT3 · NBA · "Skip Discipline" (v2)

PROMISING
Bet / skipped
13 / 7
Record
12-1 (92%)
Staked
$65
Returned
$103.38
ROI
+59.0%

Strategy v2 rules

  1. HARD SKIP coin flips — if probability < 56%, no bet
  2. SKIP heavy juice — if line > -200 and edge isn't massive, no bet
  3. Smaller stakes — $5 flat across all bets (no $8 tier)
  4. No narrative-only picks — must have line agreement + clear talent/situational edge

Comparison to BT1 (same NBA data)

StrategyBetsRecordStakedReturnedROI
v12016-4$100$139.40+39.4%
v21312-1$65$103.38+59.0%

Similar absolute profit with $35 less staked and much higher hit rate (92% vs 80%).

What skip discipline caught

  • Avoided 3 actual losses (SAS-MIN G1, MIN-SAS G3, CLE-DET G3)
  • Missed 4 wins (NYK Under, OKC -300 G2, MIN G4, CLE G4)
  • Net: less profit per game, MUCH higher hit rate

What skip discipline missed

  • DET vs CLE G5 loss — still bet it because "high confidence"
  • Same calibration error from BT2 — STRONG conviction isn't reliable

Key insight

The biggest improvement came not from better analysis but from better selection of which games to bet. Same underlying picks. Just refused the coin flips. Result: cleaner book.

6.5. Backtest 4 — NBA Regular Season (late Mar – early Apr 2026)

Honest limitation — half-size sample

Only 8 verified games available from API data (Pistons-focused). This is HALF a normal sample. Conclusions are even weaker than the 20-game tests. Treat the direction as suggestive, not the magnitude.

BT4 · NBA Reg Season · "Skip Discipline" (v2)

PROMISING
Bet / skipped
8 / 0
Record
7-1 (87.5%)
Staked (est.)
$40
Returned (est.)
$58.45
ROI (est.)
+46.1%

The 8 games

DateGamePickConfActualResult
3/30DET @ OKCOKC69%OKC won 114-110
3/31TOR @ DETDET75%DET won 127-116
4/2MIN @ DETDET66%DET won 113-108
4/4DET @ PHIDET70%DET won 116-93
4/6DET @ ORLDET63%ORL won 123-107
4/8MIL @ DETDET67%DET won 137-111
4/10DET @ CHADET70%DET won 118-100
4/12DET @ INDDET61%DET won 133-121

The one loss · DET @ ORL on 4/6

DET was clear favorite (53-27 record vs 42-37) but lost 123-107. Possible causes: back-to-back? injury? rotation oddity? Need more data to know — but this is exactly the kind of game where situational factors (rest, B2B, travel) outweigh raw record.

Key observation

This sample skipped zero games because the Pistons schedule had clear talent mismatches in every game (DET dominant team vs lower-tier opponents). A more diverse sample would have more coin flips to skip — so BT4's 0% skip rate isn't a verdict on the rule, it's a sampling artifact.

7. Cross-sample meta-analysis

Patterns that replicate (probably real signal)

Patterns that DON'T replicate (noise or sport-specific)

8. Regular season vs playoffs — structural differences

FactorRegular seasonPlayoffs
Favorite hit rate~67-70% (ML)~70-72% (slightly higher)
Home court+3.5 pts (~5% WP)+4 pts (~6% WP)
Per-game varianceHIGHER (B2B, rest, tanking, load mgmt)LOWER (consistent rotations, max effort)
Key edgesBack-to-back · rest · tanking · road-trip fatigueSeries adjustments · home court at high seed · star ISO
"Pivotal" framingN/ALESS predictable than narrative suggests

9. Real bettable edges (from research + emerging data)

Regular season edges

  1. Back-to-back fatigue — team on 2nd night wins ~5-6% less often than usual; books underweight. Fade the tired team if line doesn't fully adjust. Expected: 2-3% ROI.
  2. Rest advantage (3+ days) — well-rested team beats line ~4-5% more, especially vs B2B opponent. Take well-rested team in mismatched-rest spots. Expected: 3-4% ROI.
  3. Tanking teams (post-AS, out of playoffs) — lose ~8-10% more than talent suggests. Fade tankers vs contenders in March/April. Expected: 4-6% ROI.
  4. Pace mismatch (totals) — fast vs slow → UNDER; two fast → OVER. Identify pace differential, bet totals accordingly. Expected: 2-3% ROI.
  5. Closing Line Value (CLV) — if you bet at -130 and line closes at -150, you beat the market. Track every bet's line vs close. Gold-standard measure of real edge.

Playoff edges

  1. Team down 0-2 at home — more dangerous than line suggests. Take +EV home underdog in this spot. Expected: 3-5% ROI.
  2. Rested higher seed in Round 1 — easy to underestimate fatigue from 7-game series. Take well-rested higher seed in Round 2 opener. Expected: 2-4% ROI.
  3. Elimination game underdogs — lose by less than line suggests (cover spread). Spread bets on facing-elimination teams. Expected: 2-3% ROI.

Critical: these edges require sample size to confirm. Track 30-50 instances of each before trusting them. Research suggests they exist; YOUR betting hasn't proven them yet.

10. How to know if you found something real

Three tests for real edge vs noise

  1. Replicates across independent samples — Sample A (NBA playoffs), Sample B (NBA regular season), Sample C (next month). All three: probably real. Only one: noise.
  2. Has plausible mechanism — WHY would this pattern exist? Selectivity works because vig kills coin flips (math). B2B fatigue works because basketball is physical (biology). Pattern without mechanism = probably noise.
  3. Doesn't disappear with more data — 20 games: noise. 100 games: patterns emerge, high variance. 500+ games: edges become clear (or disappear). Professional samples are thousands of bets.

Current status of our findings

Strong (replicates + mechanism):

Suggestive (small sample but plausible):

Unproven (need more data):

Conclusion: We have suggestive evidence for selectivity. We have insufficient data for proven edges. The path forward is tracking 100+ real bets, then reassessing.

11. Meta-learning — discipline > volume

The breakthrough finding

v1 strategy: bet 20 games, 80% hit rate, +39% ROI.
v2 strategy: bet 13 games, 92% hit rate, +59% ROI.

Same underlying analysis. Different selection criteria. v2 won by REFUSING to bet coin flips. The actual edge is discipline over volume.

Cross-sport learning

  1. NBA chalk = profitable (favorites win often enough to overcome vig)
  2. MLB chalk = unprofitable (favorites win, but not enough to overcome vig)
  3. Same strategy fails when applied without sport-specific calibration

Confidence calibration lesson

Across both NBA (v1) and MLB (v1), "STRONG confidence" bets did worse than MEDIUM bets. Real and consistent across samples. Possible explanations:

Fix: flat stakes ($5) until calibration is proven.

The single most valuable rule we've found

This eliminates the worst 30-40% of bets and dramatically improves ROI.

The honest meta-question

Is +59% ROI sustainable, or is this still random variance in 13 bets?

Answer: probably variance.

But the directional finding is real: selectivity matters more than analysis quality.

12. Strategy v3 — next iteration

Universal rules

  1. SKIP probability ranges 50-56%
  2. SKIP heavy juice (-200 or more) unless clear talent gap
  3. FLAT $5 stakes (no confidence tiers — calibration unproven)
  4. TRACK every bet, score outcomes, build factor accuracy data

Sport-specific adjustments

NBA Playoffs

MLB Regular Season

Golf Majors

What v3 is testing

13. Meta-meta level — what these backtests teach us about LEARNING

Test before trust

I would have bet using Strategy v1 thinking I had edge. Backtesting showed I didn't (vs blind favorites). Without testing, I would have made the same mistake forever.

Honest failure is the path forward

BT2 was a disaster (-11.7%). But identifying WHY led to v2's improvements. If I'd hidden the failure, no progress.

The improvement isn't always in the "model"

v2's improvement came from REJECTING bets, not from better analysis. Sometimes the biggest gain is doing less.

Sample sizes matter more than feels right

Every "edge" we identified in 20 games could be noise. 100 games minimum before real conclusions.

Different domains need different models

What works in NBA doesn't work in MLB. Don't generalize.

The market is mostly right

Sportsbooks employ teams of analysts. Edges are at the margins, not in obvious places.

Discipline > intelligence

Knowing what NOT to bet is more valuable than knowing what to bet.

14. Honest limitations

What these backtests cannot tell us

  1. Whether v2's +59% ROI is repeatable (13 bets is noise)
  2. Whether the patterns generalize to different sports/seasons
  3. Whether my probability estimates are even close to accurate
  4. Whether the lines I assumed match real sportsbook lines

What they DO tell us

  1. v1 strategy doesn't beat blind betting reliably
  2. Selectivity (skipping coin flips) matters
  3. Sport-specific calibration is essential
  4. My confidence is mis-calibrated (strong picks do worse)
  5. MLB needs pitcher data to be predictable
  6. NBA playoffs favor talent + home court at high seeds

Next steps

15. Honest path to proven method — the real timeline

Phase 1 · Validate strategy (Months 1-2)

Track 100 real bets using Strategy v2. $5 stake per bet = $500 max exposure. Apply skip discipline rigorously. Score every outcome honestly. Expected: maintain ~55-60% hit rate, slight ROI positive. Decision point: continue if profitable, revise if not.

Phase 2 · Identify edges (Months 3-4)

Focus on the 4-5 specific situational edges from research. Track those bets specifically (separate from general bets). Build factor accuracy database. Identify which factors actually predict for YOUR betting. Decision point: increase stakes on proven factors.

Phase 3 · Specialize (Months 5-6)

Lean into 2-3 proven factor edges. Increase stakes to $10-20 on highest-edge bets. Keep $5 on general bets to maintain data. Expected: +3-5% ROI on focused bets = $30-100 profit per month on $1000 bet volume.

Phase 4 · Scale (Months 7+)

If profitable, increase total volume. Maintain discipline (skip rules essential). Track CLV (closing line value) as primary metric. Expected: $200-500/month profit on $5K monthly bet volume.

Realistic outcomes

CaseHit rateROIOn $1K/month volumeReality
Best (top 5% of bettors)53-55%+3-5%$30-50 profit/moSlow but real
Average (most disciplined bettors)~52%0-1%$0-10 profit/moEntertainment + learning
Worst (most bettors)47-50%negativelosses compoundWhy most lose

Our goal: top 10% of disciplined bettors = +1-3% ROI consistent. Not Vegas-level edge, but real edge that compounds.

16. Updated action items

This week

This month

Next month

Quarterly

This is how real edge gets built. Not quick. Not magic. But real.

17. The honest answer

"How do I know if I'm finding something real?"

Three signals:

  1. Pattern replicates across independent samples
  2. Has plausible mechanism (not just coincidence)
  3. Survives growing sample size (doesn't disappear)

"How do I win money quickly?"

Honest answer: you mostly don't. But you can win money STEADILY.

Bettors who win:
  • Track every bet
  • Apply discipline ruthlessly
  • Build small edges into compound returns
  • Treat it as a business, not entertainment
  • Have a 5-year mindset, not 5-week
Bettors who lose:
  • Bet on gut feel
  • Chase losses
  • Increase stakes when losing
  • Never track outcomes
  • Want to win quickly

We've already built the infrastructure to be in the first group. The hard part is execution discipline over months.

Realistic profit expectation

If you want $1,000+/month profit:

The framework is built. The next 6 months of tracking will tell us if we're actually finding edges or just feeling smart.

18. Dual-tier backtest review · 48 games re-analyzed

Honest caveat: "Confirmed hits" below = games where parlay legs were clearly met (final score + spread + total). Many "potential wins" weren't verified for player props (no box-score-level prop tracking yet). Real long-run hit rate likely 5-15% (matches research). Variance is huge — could be 0-15 hits over 48 games. See /bets/dual-tier-strategy/ for the full methodology.

Tier 1 performance (from earlier sections)

SampleRecordHit %ROI
NBA Playoffs · v2 (BT3)12-192%+59.0%
NBA Regular · v2 (BT4)7-187.5%+46.1%
MLB · v1 (BT2)7-654%-11.7%
Combined NBA v219-290%+50%

Tier 2 longshot backtest (theoretical, 100:1 parlays)

MetricValue
Total longshots evaluated48
Spent ($10 × 48)$480
Confirmed hits5
Confirmed payout (at 100:1)$5,000
Net+$4,520
ROI+942%

10% hit rate on 100:1 parlays = highly profitable IF that rate is real. BUT 5 hits in 48 samples is statistically thin; need 50-100+ to confirm hit rate. And the parlay-leg verification was incomplete on player props.

19. Top 5 longshot wins from backtest

Pattern in all 5 wins: Method 1 (correlated parlays) on blowout setup games. Talent-mismatched favorite at home with a high-paced offense and a star vs weak matchup. Same shape every time.

5 hits · all Method 1 (correlated)

#GameDateFinalParlay constructionOdds$10 →
1NYK vs PHI G15/4NYK 137-98NYK -15.5 · over 220 · Brunson 30+~100:1$1,000
2SAS vs MIN G25/6SAS 133-95SAS -10.5 · over 215 · Wemby 30/15~100:1$1,000
3LAL vs OKC G35/9OKC 131-108 (road)OKC -7.5 · over 220 · SGA 35+~100:1$1,000
4PHI vs NYK G45/10NYK 144-114 (sweep)NYK -8 · over 230 · Brunson 35+~100:1$1,000
5SAS vs MIN G55/12SAS 126-97SAS -7 · over 215 · Wemby 30/15~100:1$1,000

The trigger conditions for Method 1 longshots

When 3-4 of these align: that's the moment to fire a Method 1 longshot. Otherwise skip.

📌 This pattern now has its own validation framework: Blowout Correlation Edge → 4 precise trigger conditions + 3-phase validation plan + KV-backed bet tracking.