sports-edge · Strategy

The TL;DR

For every starting pitcher tonight, a Monte Carlo simulator runs 16,000 trials of their start, modelling K outcomes PA-by-PA.
The K probability per PA blends six signals: pitcher rolling K% (recency-weighted), opponent's K% vs that handedness, ballpark K-factor, confirmed lineup's per-batter K rate, home-plate umpire bias, and the times-through-order penalty.
The model's predicted distribution is compared to real K-prop lines from DraftKings / FanDuel / Bovada / Pinnacle (via PropLine API).
For each real (book, line, side) combination, we compute Expected Value vs the book's price. Any combination with positive EV becomes a candidate bet.
Stake size = quarter-Kelly off the model's win probability vs the book's payout, capped at your max-single-bet setting.
The walk-forward harness (2022-2024 cached Statcast) validates each model variant before it goes live. Current champion: baseline-v3-ump at 64.76% hit, MAE 1.84 Ks.

The Monte Carlo simulator

Cloned from ~/crowd-sim/simulate.py (the Thoosie park sim). For each starter we simulate plate-appearance-by-plate-appearance K outcomes across 4 seeds × 4,000 trials = 16,000 sample games per pitcher.

per-PA K probability =
sigmoid(
logit(pitcher.k_rate)
+ 0.5 × (logit(opp_k_vs_hand) − logit(league_k))
+ ln(park_k_factor)
+ ln(weather_k_mult)  # reserved
+ ln(ump_k_mult)
+ 0.05 × opp_top4_out
) × tto_mult[pa_index // 9]
per-PA blends with batter-specific K% (50/50 in log-odds) when lineup confirmed

Median of the 16,000 trials = our predicted K total. p10 and p90 form the credible interval shown on every Today card. Times-through-order curve (1.00, 0.885, 0.826, 0.703) is calibrated from 2024 Statcast — pitchers really do strike out fewer hitters their third time through.

Features (in order of measured lift)

Pitcher recency-weighted K% (EWMA span 5 blended 60/40 with season-to-date) — captures form changes mid-season. Direct feature in the simulator.
Opponent K% vs handedness — 60-day prior, split L/R. Half-weighted in log-odds so opponent lift is real but not dominant.
Ballpark K-factor — prior season's home-park K rate vs league. e.g. Comerica +0.3 K/9.
Confirmed lineup × per-batter K% — when the day's batting order is posted, simulate against each batter's actual K rate vs the pitcher's handedness, cycling through positions 1-9.
Home-plate umpire bias — prior-season K%/PA when this ump was behind the plate. Mike Estabrook (highest 2024): +20% above league. Carlos Torres (lowest): −12%.
Times-through-order curve — empirical K% drop from 1st (23.7%) to 3rd (19.6%) trip through batting order, applied PA-by-PA in the sim.

Deferred to next iteration: catcher framing (needs separate framing-runs dataset), pitch-mix matchup (whiff% by pitch type × lineup weakness), weather coupling, xgboost replacement for the sigmoid blend (1-2 day project — gated on more data accumulation).

Walk-forward validation

Discipline: every feature at prediction time T uses only data with game_date < T. This is non-negotiable — a prior MLB moneyline attempt died because full-season aggregates leaked future data into "historical" predictions.

Methodology: 10,537 starter-games across 2022-2024. For each game, the model predicts the K total using only data prior to first pitch. The "edge" is measured against a naive book proxy (pitcher K% × opp K% lift × park K-factor) — what a vanilla market would price. Real walk-forward against Vegas closing lines is the next milestone (we're now accumulating real PropLine prices daily).

Model progression (overall hit %, MAE Ks)
baseline-v0 (original)              61.40%   1.90
baseline-v0-strict-naive            61.46%   1.90  (calibration only)
baseline-v1 (+TTO + recency)        64.21%   1.85   +2.81 pp
baseline-v2 (+per-batter lineup)    64.79%   1.84   +0.58 pp
baseline-v3 (+umpire bias)          64.76%   1.84   marginal
Bonferroni α = 0.0100 across 5 non-empty divergence tiers — all material tiers clear it.

Sizing: quarter-Kelly with hard cap

For each candidate bet we compute the empirical hit rate in its divergence tier from walk-forward, then size using:

kelly_full = (p · b − (1 − p)) / b
where p = model win prob, b = decimal_odds − 1
stake_% = min(kelly_full × user_kelly_fraction, user_max_single_bet_pct)
stake_$ = bankroll × stake_%

Default user_kelly_fraction = 0.25 (¼-Kelly), capped at 2% of bankroll per single bet. The cap binds for high-edge picks; sub-cap-Kelly binds for low-edge picks. Both are tweakable in Settings.

Real book lines via PropLine

For each game with open K-prop markets we query PropLine (free tier, 1,000 req/day) and pull every book's price at every line. Our model's P(side) is compared against each (book × line × side) — the combination with highest EV becomes the recommended bet.

Pinnacle's no-vig fair probability (computed from their over/under prices via the proportional de-vig method) is shown as a sharp-market anchor. When the model and Pinnacle agree, conviction is higher; when they disagree, the play is either a real model edge or a model error — we flag both cases for review.

★ BET / WATCH / INFO — the recommendation filter

The engine produces a Monte Carlo prediction for every starter on the slate, but only flags a small subset as actual bets. Every other game is shown for context only. Criteria for each tier:

★ BET

All four required: (1) real book line available (we can actually place it); (2) real EV between +3% and +25%; (3) Pinnacle's no-vig fair probability within 30pp of our model (sharp-market agreement); (4) model divergence from naive baseline ≥ 0.5 Ks. Sized via quarter-Kelly.

WATCH

Real book line exists, but one of the BET criteria failed. Most common case: EV above +25% ceiling — books don't normally leave 50%+ edges sitting around, so an apparent +50% EV almost always means our model is wrong about that pitcher (stale form, missing feature, lineup quirk). Listed so we can monitor and learn. No stake recommended.

INFO

No open book line for this pitcher (market closed, late starter, or PropLine doesn't carry that book). Model prediction shown for context but cannot be bet from this surface.

Today's forecast tiles (tonight P/L, 7-day, 30-day, 90-day projections) include only ★ BET picks. The watch and info tiers don't contribute to the bankroll forecast — they're for visibility and post-game analysis.

Honest caveats — please read before staking

The 64.76% walk-forward hit rate is measured against a naive book proxy, not against real Vegas closing lines. Real edge will likely be smaller — books already use most of our features. We're now accumulating real PropLine prices daily; in ~30 days we'll have enough to measure true edge.
The model's K-rate features for the current season are sourced from the most recent fully-cached Statcast season. For the 2026 season we need to backfill 2026 Statcast as it grows — pending.
Some live picks show EVs above +50%; those are almost certainly model overconfidence rather than real edges that size. Treat high-EV outliers with skepticism — books don't usually leave 50%-edge bets on the board.
Variance is real. Even a +5% true EV strategy will go on multi-day losing streaks. Stick to the ¼-Kelly sizing — the math survives streaks; emotion doesn't.
This is not financial advice. You alone decide whether to place any bet.

Code references

~/sports-edge/sports_edge/simulator/mlb_game_sim.py — Monte Carlo engine
~/sports-edge/sports_edge/features/mlb_features.py — pitcher / batter / park / umpire features
~/sports-edge/sports_edge/feeds/propline.py — book line client + de-vig math
~/sports-edge/sports_edge/validation.py — sizing from walk-forward tier hit rates
~/sports-edge/scripts/run_walk_forward.py — chronological-per-prediction harness
~/sports-edge/scripts/refresh_now.py — live orchestrator (schedule → features → sim → book shop → picks/today.json)
~/sports-edge/.claude/plans/hashed-exploring-lampson.md — original implementation plan