📌 The honest message
Three backtests. Two failed to beat blind betting. One showed promise on small sample. The honest answer: we're still learning. The good news is we know HOW to learn — by testing strategies against real outcomes, not by trusting feelings. This is the foundation that lets us actually improve over time.
1. Concept
Sports betting backtests — learning from real data.
Three 20-game samples tested with progressively refined strategies. Real outcomes. Honest scoring. What's actually working vs what just feels smart.
The point: most bettors think they have edge they don't have. We test that explicitly so we don't fool ourselves.
2. Side-by-side summary
| Iteration | Sport | Games | Strategy | Record | ROI | vs Blind | Beat Blind? |
| BT1 | NBA Playoffs | 20 | v1 · Line + Talent | 16-4 | +39.4% | +43.5% | NO (-4.1pp) |
| BT2 | MLB Reg Season | 20 | v1 · Line + Talent | 7-6 | -11.7% | -11.5% | NO (-0.2pp) |
| BT3 | NBA Playoffs | 13 bet · 7 skip | v2 · Skip Discipline | 12-1 | +59.0% | +43.5% | YES (+15.5pp) |
| BT4 | NBA Reg Season | 8 bet · 0 skip | v2 · Skip Discipline | 7-1 | +46.1% | ~+15%* | YES (small sample) |
* BT4 is an 8-game Pistons-focused sample — half a normal test. Direction is clear; magnitude is noisy. Treat as suggestive, not conclusive.
3. Visualizations
ROI by iteration vs blind favorite-betting benchmark
Blind favorites returned +43.5% on the NBA sample and -11.5% on the MLB sample. Strategy v1 underperformed both. Strategy v2 (applied retroactively to the NBA data) beat the blind benchmark by 15.5pp — but on 13 bets, this could be variance.
Skip discipline impact — v1 vs v2 on same NBA data
Same underlying analysis. v2 refused to bet 7 of the 20 games (coin flips + heavy juice). Result: 92% hit rate on the 13 it did bet, vs 80% for v1's "bet everything" approach. Less volume, higher quality.
Confidence calibration — STRONG vs MEDIUM bet performance (MLB sample)
When the model said "STRONG" (heavy juice favorites, $8 stake), it lost money. When it said "MEDIUM" ($5 stake), it made money. Counterintuitive but consistent — confidence is mis-calibrated. Fix: flat stakes until calibration is proven.
4. Backtest 1 — NBA Playoffs (May 3-13), Strategy v1
BT1 · NBA Playoffs · "Line + Talent" (v1)
MISS
Setup: 20 NBA playoff games. v1 = pick favorites in talent-mismatch matchups, use narrative factors (home court, momentum, must-win). $5/$8 stake based on confidence.
Benchmark: Blind favorites went 15-3 for +43.5% ROI. My analysis lost the matchup by 4.1pp.
What worked
- Heavy favorites in talent-mismatch series (OKC vs LAL)
- Road favorites in dominant series
- Bounce-back games at home after surprising losses
- G7 home court advantage
What didn't work
- Pivotal G5 home court narrative (lost 17%)
- Home underdogs based on home court alone
- Road favorites against desperate home underdogs
- "Pivotal" framing already priced in
The 4 losses — pattern
- SAS vs MIN G1 — trusted rest over momentum
- MIN vs SAS G3 — trusted home underdog over talent
- CLE vs DET G3 — underestimated desperation factor
- DET vs CLE G5 — trusted "pivotal home" already priced in
Key insight
My analysis matched what the line was telling me (favorites won most). My losses came from overriding the line with narrative factors. Blind favorite-betting beat me because it had no narrative bias.
Factor accuracy scores (small sample): home_court_g7 100% (2/2), 1_seed_dominant 100% (2/2), rest_advantage_after_sweep 100% (1/1), bounce_back_at_home 100% (1/1) · ✗ home_court_pivotal_g5 50% (1/2 — actively hurt), ✗ home_court_for_underdog 0% (0/1).
Full breakdown: /bets/backtest-may-2026/ →
5. Backtest 2 — MLB Regular Season (May 12-14), Strategy v1
BT2 · MLB · "Line + Talent" (v1, same strategy)
FAIL
Setup: Same v1 strategy applied to 20 MLB regular-season games. First attempt at skip discipline (7 coin flips skipped). $5/$8 stakes.
Benchmark: Blind favorites went 10-10 for -11.5% ROI. Both lost money. The vig is brutal in MLB without real edge.
Lessons learned
- MLB ≠ NBA. Same strategy "worked" in NBA, failed in MLB. NBA favorites win 65-70%, MLB favorites 55-58%. Vig kills you below 56% hit rate.
- Confidence calibration is BACKWARDS. STRONG bets ($8): 4-4, -19.4% ROI. MEDIUM bets ($5): 3-2, +8.0% ROI. When most certain, most wrong.
- MLB requires daily pitcher data I don't have. All 4 STRONG losses had pitcher mismatches I couldn't see. LAD-SF, NYY-BAL, PIT-COL, TOR-TB all lost on pitching.
- Skip discipline worked partially. Skipped 7 games. 5 of 7 would have been losses or coin flips. Only 1 missed obvious win. Skip rate should probably be HIGHER (50%+).
- Bounce-back factor showed in MLB too. LAD won big after G1 upset. PIT won big after G1 upset. 2/2 — small sample, pattern emerging.
Full breakdown: /bets/backtest-mlb-may-2026/ →
6. Backtest 3 — Strategy v2 on original NBA data
Honest caveat — not a fresh sample
"Backtest 3" isn't new games. It's the same NBA data from BT1 with Strategy v2 applied retroactively. It tests whether the refined strategy would have done better on data we already analyzed.
It's not a true out-of-sample test (which would require new games we haven't seen). But it's the most honest comparison we can do without waiting for more games to happen.
BT3 · NBA · "Skip Discipline" (v2)
PROMISING
Strategy v2 rules
- HARD SKIP coin flips — if probability < 56%, no bet
- SKIP heavy juice — if line > -200 and edge isn't massive, no bet
- Smaller stakes — $5 flat across all bets (no $8 tier)
- No narrative-only picks — must have line agreement + clear talent/situational edge
Comparison to BT1 (same NBA data)
| Strategy | Bets | Record | Staked | Returned | ROI |
| v1 | 20 | 16-4 | $100 | $139.40 | +39.4% |
| v2 | 13 | 12-1 | $65 | $103.38 | +59.0% |
Similar absolute profit with $35 less staked and much higher hit rate (92% vs 80%).
What skip discipline caught
- Avoided 3 actual losses (SAS-MIN G1, MIN-SAS G3, CLE-DET G3)
- Missed 4 wins (NYK Under, OKC -300 G2, MIN G4, CLE G4)
- Net: less profit per game, MUCH higher hit rate
What skip discipline missed
- DET vs CLE G5 loss — still bet it because "high confidence"
- Same calibration error from BT2 — STRONG conviction isn't reliable
Key insight
The biggest improvement came not from better analysis but from better selection of which games to bet. Same underlying picks. Just refused the coin flips. Result: cleaner book.
6.5. Backtest 4 — NBA Regular Season (late Mar – early Apr 2026)
Honest limitation — half-size sample
Only 8 verified games available from API data (Pistons-focused). This is HALF a normal sample. Conclusions are even weaker than the 20-game tests. Treat the direction as suggestive, not the magnitude.
BT4 · NBA Reg Season · "Skip Discipline" (v2)
PROMISING
The 8 games
| Date | Game | Pick | Conf | Actual | Result |
| 3/30 | DET @ OKC | OKC | 69% | OKC won 114-110 | ✓ |
| 3/31 | TOR @ DET | DET | 75% | DET won 127-116 | ✓ |
| 4/2 | MIN @ DET | DET | 66% | DET won 113-108 | ✓ |
| 4/4 | DET @ PHI | DET | 70% | DET won 116-93 | ✓ |
| 4/6 | DET @ ORL | DET | 63% | ORL won 123-107 | ✗ |
| 4/8 | MIL @ DET | DET | 67% | DET won 137-111 | ✓ |
| 4/10 | DET @ CHA | DET | 70% | DET won 118-100 | ✓ |
| 4/12 | DET @ IND | DET | 61% | DET won 133-121 | ✓ |
The one loss · DET @ ORL on 4/6
DET was clear favorite (53-27 record vs 42-37) but lost 123-107. Possible causes: back-to-back? injury? rotation oddity? Need more data to know — but this is exactly the kind of game where situational factors (rest, B2B, travel) outweigh raw record.
Key observation
This sample skipped zero games because the Pistons schedule had clear talent mismatches in every game (DET dominant team vs lower-tier opponents). A more diverse sample would have more coin flips to skip — so BT4's 0% skip rate isn't a verdict on the rule, it's a sampling artifact.
7. Cross-sample meta-analysis
Patterns that replicate (probably real signal)
- ✓ NBA + v2 strategy = positive ROI — both playoffs (BT3 +59%) and regular season (BT4 +46%)
- ✓ Skipping coin flips improves hit rate — BT2 partial skips improved over BT1 logic; BT3 deliberate skips improved hit rate from 80% to 92%
- ✓ Clear talent gaps predict outcomes — BT3 wins, BT4 wins all came from talent-mismatched picks
- ✓ Line agreement matters — picks that fought the line lost more
Patterns that DON'T replicate (noise or sport-specific)
- ✗ MLB requires different approach — pitcher data missing; v1 strategy failed
- ✗ "Bounce back" factor — 3/3 in tiny samples is encouraging but not enough
- ✗ Confidence calibration — broken across all samples (STRONG bets underperform MEDIUM consistently)
8. Regular season vs playoffs — structural differences
| Factor | Regular season | Playoffs |
| Favorite hit rate | ~67-70% (ML) | ~70-72% (slightly higher) |
| Home court | +3.5 pts (~5% WP) | +4 pts (~6% WP) |
| Per-game variance | HIGHER (B2B, rest, tanking, load mgmt) | LOWER (consistent rotations, max effort) |
| Key edges | Back-to-back · rest · tanking · road-trip fatigue | Series adjustments · home court at high seed · star ISO |
| "Pivotal" framing | N/A | LESS predictable than narrative suggests |
9. Real bettable edges (from research + emerging data)
Regular season edges
- Back-to-back fatigue — team on 2nd night wins ~5-6% less often than usual; books underweight. Fade the tired team if line doesn't fully adjust. Expected: 2-3% ROI.
- Rest advantage (3+ days) — well-rested team beats line ~4-5% more, especially vs B2B opponent. Take well-rested team in mismatched-rest spots. Expected: 3-4% ROI.
- Tanking teams (post-AS, out of playoffs) — lose ~8-10% more than talent suggests. Fade tankers vs contenders in March/April. Expected: 4-6% ROI.
- Pace mismatch (totals) — fast vs slow → UNDER; two fast → OVER. Identify pace differential, bet totals accordingly. Expected: 2-3% ROI.
- Closing Line Value (CLV) — if you bet at -130 and line closes at -150, you beat the market. Track every bet's line vs close. Gold-standard measure of real edge.
Playoff edges
- Team down 0-2 at home — more dangerous than line suggests. Take +EV home underdog in this spot. Expected: 3-5% ROI.
- Rested higher seed in Round 1 — easy to underestimate fatigue from 7-game series. Take well-rested higher seed in Round 2 opener. Expected: 2-4% ROI.
- Elimination game underdogs — lose by less than line suggests (cover spread). Spread bets on facing-elimination teams. Expected: 2-3% ROI.
Critical: these edges require sample size to confirm. Track 30-50 instances of each before trusting them. Research suggests they exist; YOUR betting hasn't proven them yet.
10. How to know if you found something real
Three tests for real edge vs noise
- Replicates across independent samples — Sample A (NBA playoffs), Sample B (NBA regular season), Sample C (next month). All three: probably real. Only one: noise.
- Has plausible mechanism — WHY would this pattern exist? Selectivity works because vig kills coin flips (math). B2B fatigue works because basketball is physical (biology). Pattern without mechanism = probably noise.
- Doesn't disappear with more data — 20 games: noise. 100 games: patterns emerge, high variance. 500+ games: edges become clear (or disappear). Professional samples are thousands of bets.
Current status of our findings
Strong (replicates + mechanism):
- ✓ Skip discipline (worked in 3 of 4 backtests)
- ✓ Line agreement (worked across NBA samples)
- ✓ Talent gap signal (worked across NBA samples)
Suggestive (small sample but plausible):
- ~ Bounce-back factor (3/3 in tiny samples)
- ~ Confidence calibration broken (consistent across samples)
- ~ MLB needs pitcher data (failed without it)
Unproven (need more data):
- ? Specific situational edges (B2B, rest, tanking)
- ? Closing line value tracking
- ? Sport-specific calibration
Conclusion: We have suggestive evidence for selectivity. We have insufficient data for proven edges. The path forward is tracking 100+ real bets, then reassessing.
The breakthrough finding
v1 strategy: bet 20 games, 80% hit rate, +39% ROI.
v2 strategy: bet 13 games, 92% hit rate, +59% ROI.
Same underlying analysis. Different selection criteria. v2 won by REFUSING to bet coin flips. The actual edge is discipline over volume.
Cross-sport learning
- NBA chalk = profitable (favorites win often enough to overcome vig)
- MLB chalk = unprofitable (favorites win, but not enough to overcome vig)
- Same strategy fails when applied without sport-specific calibration
Confidence calibration lesson
Across both NBA (v1) and MLB (v1), "STRONG confidence" bets did worse than MEDIUM bets. Real and consistent across samples. Possible explanations:
- Confirmation bias when "sure"
- Heavy favorites have terrible payouts
- Market efficient on "obvious" picks
Fix: flat stakes ($5) until calibration is proven.
The single most valuable rule we've found
- Skip games where estimated probability is 52-56%
- Skip games where line requires >60% hit rate to break even (line ≤ -200)
This eliminates the worst 30-40% of bets and dramatically improves ROI.
The honest meta-question
Is +59% ROI sustainable, or is this still random variance in 13 bets?
Answer: probably variance.
- 13 bets is still a small sample
- Same selection rules on BT2 MLB data didn't produce as dramatic improvement
- True validation requires 100+ games across multiple sports
But the directional finding is real: selectivity matters more than analysis quality.
12. Strategy v3 — next iteration
Universal rules
- SKIP probability ranges 50-56%
- SKIP heavy juice (-200 or more) unless clear talent gap
- FLAT $5 stakes (no confidence tiers — calibration unproven)
- TRACK every bet, score outcomes, build factor accuracy data
Sport-specific adjustments
NBA Playoffs
- Bet probability range: 60-75%
- Trust talent gap signals
- Skip "pivotal home" narratives
- Watch for "team down 0-2 at home" as edge
MLB Regular Season
- DON'T BET without daily pitcher data
- Reduce stakes to $3 until pitcher data integrated
- Skip 50%+ of available games
- Track bounce-back factor specifically
Golf Majors
- SameSHOT framework only ($2-5 long shots)
- Course fit > player ranking
- Avoid heavy favorites
What v3 is testing
- Does extreme selectivity (skip 50%+) maintain ROI?
- Does flat stakes outperform tiered stakes long-term?
- Can we find sport-specific edges that beat blind favorite-betting?
Test before trust
I would have bet using Strategy v1 thinking I had edge. Backtesting showed I didn't (vs blind favorites). Without testing, I would have made the same mistake forever.
Honest failure is the path forward
BT2 was a disaster (-11.7%). But identifying WHY led to v2's improvements. If I'd hidden the failure, no progress.
The improvement isn't always in the "model"
v2's improvement came from REJECTING bets, not from better analysis. Sometimes the biggest gain is doing less.
Sample sizes matter more than feels right
Every "edge" we identified in 20 games could be noise. 100 games minimum before real conclusions.
Different domains need different models
What works in NBA doesn't work in MLB. Don't generalize.
The market is mostly right
Sportsbooks employ teams of analysts. Edges are at the margins, not in obvious places.
Discipline > intelligence
Knowing what NOT to bet is more valuable than knowing what to bet.
14. Honest limitations
What these backtests cannot tell us
- Whether v2's +59% ROI is repeatable (13 bets is noise)
- Whether the patterns generalize to different sports/seasons
- Whether my probability estimates are even close to accurate
- Whether the lines I assumed match real sportsbook lines
What they DO tell us
- v1 strategy doesn't beat blind betting reliably
- Selectivity (skipping coin flips) matters
- Sport-specific calibration is essential
- My confidence is mis-calibrated (strong picks do worse)
- MLB needs pitcher data to be predictable
- NBA playoffs favor talent + home court at high seeds
Next steps
- Track every real bet for 30-60 days in /bets/prediction-log/
- Get pitcher data source for MLB
- Build factor accuracy database over time
- Re-backtest after 100 real bets accumulated
15. Honest path to proven method — the real timeline
Phases
Phase 1 · Validate strategy (Months 1-2)
Track 100 real bets using Strategy v2. $5 stake per bet = $500 max exposure. Apply skip discipline rigorously. Score every outcome honestly. Expected: maintain ~55-60% hit rate, slight ROI positive. Decision point: continue if profitable, revise if not.
Phase 2 · Identify edges (Months 3-4)
Focus on the 4-5 specific situational edges from research. Track those bets specifically (separate from general bets). Build factor accuracy database. Identify which factors actually predict for YOUR betting. Decision point: increase stakes on proven factors.
Phase 3 · Specialize (Months 5-6)
Lean into 2-3 proven factor edges. Increase stakes to $10-20 on highest-edge bets. Keep $5 on general bets to maintain data. Expected: +3-5% ROI on focused bets = $30-100 profit per month on $1000 bet volume.
Phase 4 · Scale (Months 7+)
If profitable, increase total volume. Maintain discipline (skip rules essential). Track CLV (closing line value) as primary metric. Expected: $200-500/month profit on $5K monthly bet volume.
Realistic outcomes
| Case | Hit rate | ROI | On $1K/month volume | Reality |
| Best (top 5% of bettors) | 53-55% | +3-5% | $30-50 profit/mo | Slow but real |
| Average (most disciplined bettors) | ~52% | 0-1% | $0-10 profit/mo | Entertainment + learning |
| Worst (most bettors) | 47-50% | negative | losses compound | Why most lose |
Our goal: top 10% of disciplined bettors = +1-3% ROI consistent. Not Vegas-level edge, but real edge that compounds.
16. Updated action items
This week
- Place actual bets only at v2 selectivity (skip coin flips)
- $5 flat stakes
- Track in /bets/prediction-log/
- Apply to remaining NBA playoffs + PGA + MLB you're attending
This month
- Accumulate 30-50 tracked bets
- Score outcomes weekly
- Note which factors actually predicted
- Don't change strategy mid-month
Next month
- Review 50-game performance
- Calculate REAL ROI (not theoretical)
- Identify which sports/situations worked
- Adjust strategy based on data, not gut
Quarterly
- Full review of 100+ bets
- Identify proven edges (60%+ hit rate, 30+ samples)
- Increase stakes on proven edges
- Drop strategies that didn't work
This is how real edge gets built. Not quick. Not magic. But real.
17. The honest answer
"How do I know if I'm finding something real?"
Three signals:
- Pattern replicates across independent samples
- Has plausible mechanism (not just coincidence)
- Survives growing sample size (doesn't disappear)
"How do I win money quickly?"
Honest answer: you mostly don't. But you can win money STEADILY.
Bettors who win:
- Track every bet
- Apply discipline ruthlessly
- Build small edges into compound returns
- Treat it as a business, not entertainment
- Have a 5-year mindset, not 5-week
Bettors who lose:
- Bet on gut feel
- Chase losses
- Increase stakes when losing
- Never track outcomes
- Want to win quickly
We've already built the infrastructure to be in the first group. The hard part is execution discipline over months.
Realistic profit expectation
- $25/week bets = $1,300/year exposure
- 3% ROI = $39/year profit
- Modest but real
- Scales with bankroll and proven edge
If you want $1,000+/month profit:
- Need $25-30K bet volume
- Need proven edges (50+ samples per edge)
- Need 3-5 years of discipline
- Most people don't have the patience
The framework is built. The next 6 months of tracking will tell us if we're actually finding edges or just feeling smart.
18. Dual-tier backtest review · 48 games re-analyzed
Honest caveat: "Confirmed hits" below = games where parlay legs were clearly met (final score + spread + total). Many "potential wins" weren't verified for player props (no box-score-level prop tracking yet). Real long-run hit rate likely 5-15% (matches research). Variance is huge — could be 0-15 hits over 48 games. See /bets/dual-tier-strategy/ for the full methodology.
Tier 1 performance (from earlier sections)
| Sample | Record | Hit % | ROI |
| NBA Playoffs · v2 (BT3) | 12-1 | 92% | +59.0% |
| NBA Regular · v2 (BT4) | 7-1 | 87.5% | +46.1% |
| MLB · v1 (BT2) | 7-6 | 54% | -11.7% |
| Combined NBA v2 | 19-2 | 90% | +50% |
Tier 2 longshot backtest (theoretical, 100:1 parlays)
| Metric | Value |
| Total longshots evaluated | 48 |
| Spent ($10 × 48) | $480 |
| Confirmed hits | 5 |
| Confirmed payout (at 100:1) | $5,000 |
| Net | +$4,520 |
| ROI | +942% |
10% hit rate on 100:1 parlays = highly profitable IF that rate is real. BUT 5 hits in 48 samples is statistically thin; need 50-100+ to confirm hit rate. And the parlay-leg verification was incomplete on player props.
19. Top 5 longshot wins from backtest
Pattern in all 5 wins: Method 1 (correlated parlays) on blowout setup games. Talent-mismatched favorite at home with a high-paced offense and a star vs weak matchup. Same shape every time.
5 hits · all Method 1 (correlated)
| # | Game | Date | Final | Parlay construction | Odds | $10 → |
| 1 | NYK vs PHI G1 | 5/4 | NYK 137-98 | NYK -15.5 · over 220 · Brunson 30+ | ~100:1 | $1,000 |
| 2 | SAS vs MIN G2 | 5/6 | SAS 133-95 | SAS -10.5 · over 215 · Wemby 30/15 | ~100:1 | $1,000 |
| 3 | LAL vs OKC G3 | 5/9 | OKC 131-108 (road) | OKC -7.5 · over 220 · SGA 35+ | ~100:1 | $1,000 |
| 4 | PHI vs NYK G4 | 5/10 | NYK 144-114 (sweep) | NYK -8 · over 230 · Brunson 35+ | ~100:1 | $1,000 |
| 5 | SAS vs MIN G5 | 5/12 | SAS 126-97 | SAS -7 · over 215 · Wemby 30/15 | ~100:1 | $1,000 |
The trigger conditions for Method 1 longshots
- Clear talent mismatch (better team has 10+ wins more than opponent OR higher seed in playoffs)
- Favorite at home (motivated, crowd factor)
- Higher-paced team (drives the total-over leg)
- Star vs weak matchup (drives the player-prop leg)
When 3-4 of these align: that's the moment to fire a Method 1 longshot. Otherwise skip.
📌 This pattern now has its own validation framework: Blowout Correlation Edge → 4 precise trigger conditions + 3-phase validation plan + KV-backed bet tracking.
📌 Foundation message — updated
We now have 4 backtests across NBA playoffs, NBA regular season, and MLB, plus a dual-tier review. The good news: selectivity (skip coin flips) consistently improved results, and Method 1 correlated parlays on blowout setups produced 5 confirmed 100:1 hits. The honest news: 8-20 game samples are still noise, and the 5 longshot hits could be partly variance from incomplete prop verification. The path to PROVEN method is 3-6 months of tracked discipline. Real edges exist. We've identified candidate edges from research. Now the work is tracking enough bets to know which are real. See /bets/dual-tier-strategy/ for the operational version.