Archive · NHL Goal Predictor

How the models actually did, day by day

The dashboard publishes tonight's predictions. This page publishes every previous night's predictions and what actually happened. The point is honesty: when a model claims "60% chance to score" tonight, the only way to know whether 60% means anything is to check it against thousands of past 60% claims and count how often they came true.

What this archive measures

Every game day at 06:00 UTC, the results pipeline pulls each game's box score from the NHL API, identifies who actually scored, and writes a JSON file to data/results/{date}.json. The file records every scorer that night plus, for each model on the site, that model's top-10 picks and a boolean indicating whether each pick scored.

Two metrics matter most for understanding archive performance:

Top-10 hit rate. Of the model's ten highest-probability picks for the night, how many actually scored at least one goal? On a typical eight-game NHL slate with ~14 to 22 actual goal scorers, a coin-flip baseline is roughly 1.5 to 2 hits in any randomly chosen 10 players. A model performing well comes in around 4 to 6 hits. A model performing poorly clusters with the random baseline.
Calibration. Whether the model's predicted probabilities match observed frequencies. If the model's top pick is 0.45 across a hundred nights, the actual hit rate on those 100 picks should be near 45%. Calibration is harder to read in a single archive row but easy to spot across many rows: a model whose top picks come out at "55%" but actually score at 30% is overconfident, and the validator catches this.

The table below loads the most recent ~15 days of archive data straight from the JSON files in this repository. The dates link to the raw result files; click through to see exactly which players each model picked and which of them scored.

Recent results

Date	Games	Scorers	Meta top-10	xG top-10	Market top-10	Result file
Loading archive…

How to read a hit-rate number

A few things worth keeping in mind when scanning this archive:

One night is noise. Hockey is a high-variance scoring environment. A 22-shot game can produce zero goals; a 17-shot game can produce six. Any single date's hit rate is dominated by random scoring variance, not model quality. The archive is most useful when you read it as a moving window — averaged over the last 10–15 game days, the patterns become real.

Some nights are easier than others. A four-game NHL slate gives the models fewer chances to land top-10 picks across many matchups; a 12-game slate gives them more. Hit rates tend to cluster higher on big-slate nights for purely combinatorial reasons. The archive shows the games count alongside the hit rate so you can normalize for this.

Different models are designed for different conditions. The Market Odds model can only contribute on games where bookmakers have posted prices — typically more than half of nightly games but not all of them. The Lineup TOI model leans on recent ice time data, which is noisier early in a season than late. The Meta Ensemble inherits all of these dependencies. A blank or unusually low cell is often "the model couldn't run cleanly tonight" rather than "the model was wrong."

The honest target is consistent calibration, not maximal hit rate. A model that produces well-calibrated probabilities is more useful for decision-making than a model that occasionally lands huge nights but is overconfident on average. The validator's calibration drift alerts are a louder signal than any individual day's hit rate, which is why the daily commit log will sometimes show calibration tweaks (e.g. raising the hard top-probability ceiling) without any change to the underlying models.

Where the data lives

All result and prediction files are public in the project repository:

data/results/{date}.json — that night's actual goal scorers plus per-model top-10 picks and whether they scored.
data/predictions/meta_ensemble/{date}.json — that morning's Meta Ensemble predictions before games started.
data/predictions/{model_name}/{date}.json — same for every other model on the site.
data/health_report.json — the validator's nightly summary of recent calibration and hit-rate drift.

For full context on how each model produces its top-10, see the methodology page. For definitions of "hit rate," "calibration," and the other terms used here, see the glossary.

Last updated 2026-05-03. The table above re-fetches every page load; if you see "Loading archive…" indefinitely, the JSON endpoints are temporarily unreachable and the static text on this page is what's still authoritative.