Live Calibration

Live scoring begins as 2026 races resolve. No 2026 race outcomes are recorded yet. The model is freezing its predictions weekly into forecast_snapshots so each race can be scored the moment its winner is recorded in race_outcomes.

Reliability plot

Each dot is a 10-percentage-point bucket of forecasts. The x-axis is the model's mean predicted P(D) in that bucket; the y-axis is the empirical D-win rate of those races. A perfectly calibrated model puts every dot on the diagonal. Dot size is the resolved-race count in the bucket. The plot is empty until 2026 races resolve.

Snapshots

The forecast snapshotter runs weekly (Monday 13:00 ET) and writes today's projected P(D) for every forward 2026 race into forecast_snapshots. So far: 349 snapshot rows total across all races; 0 snapshot rows are for races already resolved and contribute to the live Brier above.

Backtest Brier (2024)

The published model lr-2026-06-10-chal-recal-twospeed reports a 2024 held-out test Brier of 0.1191. This is a backtest number, not live. This model serves on [2022,2024] (the two most recent labeled cycles), so it cannot hold out 2024 itself. The reported Brier is the genuine out-of-sample score of the identical narrow-winsor15 configuration trained on [2020,2022] and tested on the held-out 2024 cycle (the prior-window policy). Serving window and skill-measurement window are reported separately and never conflated. It is shown here for context only and is never combined with the live Brier above.