About the project
Eight models in a Quebec apartment
This project began as a private spreadsheet for the Tim Hortons Hockey Challenge and ended up as a daily-publishing prediction stack with eight models, a stacked meta-ensemble, and a calibration validator that tells me when my own models are lying. This page is the story of how it got there, what runs every morning, and what I have not yet built.
Why this exists
The Tim Hortons Hockey Challenge gives players a list of three groups of five NHL skaters per night and asks you to pick one player from each group whose total goals + assists for the night are higher than the players you didn't pick. It is mostly a casual game. It is also, structurally, a forced-choice ranking task — exactly the shape of problem statistics actually gives you traction on.
The first version of this project lived in a Google Sheet. I was averaging a few features by hand, scoring the player pool, and submitting picks. That worked badly enough — and was tedious enough — that I rewrote it as a Python script. The script worked well enough that I added another model. Then the second model disagreed with the first model often enough that I added a third to break ties. Then I noticed the first model was actually better than the third on certain matchups, and the only way to learn which matchups was to keep both models and measure. Roughly that pattern, repeated, is how I ended up with eight.
The whole point of running this many models is that no single approach to this problem dominates. A market-implied probability is great information but is silent on a third of the night's games where the books haven't posted. A pure xG model is sharp on shot quality but blind to whether a guy was a healthy scratch yesterday. A Monte Carlo simulation captures squad-level uncertainty but is mediocre at picking between two depth-line wingers. Each model fills in where another is weak.
The eight models, one paragraph each
Market Odds v1 averages anytime-goal-scorer odds across major US sportsbooks and converts them to probabilities. Strongest single model on most nights; missing entirely on others.
Weighted Linear v1 is a hand-tuned formula on season averages — the dumb baseline whose job is to embarrass the fancy models when they don't beat it. It often does.
Monte Carlo v2 simulates each game 10,000 times: Poisson-draw the team's goals, then assign each goal to a player weighted by season scoring rate raised to the 1.8 power. The exponent is the trick — it captures how much more disproportionately stars score than their averages alone suggest.
xG XGBoost v3 is a per-shot expected-goals model trained on play-by-play data and converted to a per-game probability via the player's expected shot count. The conversion is a Poisson assumption: P(≥1 goal) = 1 − e^(−total_xG).
Lineup TOI v1 is the same model as xG XGBoost v3, with one extra step that scales expected shots by the ratio of the player's recent ice time to their season average. It catches the cases where everyone else is still ranking a player who got benched yesterday.
Neural MLP v1 is a small scikit-learn neural network that re-ranks the Monte Carlo output using player-level aggregate features. It learns where Monte Carlo is consistently too generous or too stingy.
Neural Embed v2 is a PyTorch MLP with learnable 32-dimensional player embeddings on top of shot-level features. It's the only model on the site that can represent "this defenseman's shots go in more often than the geometry alone predicts."
Meta Ensemble v1 stacks four base models (linear, MC, xG, market) plus disagreement, goalie, and matchup features into a LightGBM classifier with isotonic calibration on top. This is the model the site treats as canonical.
The technical writeup of every model — features, formulas, hyperparameters, failure modes — lives in methodology.
What the daily pipeline actually does
Every morning, a chain of GitHub Actions cron jobs runs, scheduled around when the day's matchups stabilize and lineup signals start to firm up. Times below are UTC; the schedule is timed to morning Eastern.
data/results/ from the NHL boxscore endpoint.data/health_report.json.
Each step writes its output as JSON into data/predictions/{model_name}/ and commits to the repo. The dashboard you land on at the predictor page is plain static HTML that fetches those JSON files at load time. There is no backend; there is no database; there is a pipeline of small Python scripts and a folder of JSON files.
What running this for a season has taught me
A few things that were not obvious to me before:
The market is the strongest single feature, but it isn't a ceiling. The Meta Ensemble outperforms the market on its own roughly often enough to justify the rest of the stack — most of the lift comes from cases where the market hasn't priced a lineup change or a goalie matchup that the other models see clearly.
Calibration drifts. Models trained months ago against a different scoring environment quietly become overconfident — they keep predicting "20% to score" with the same calibration as before, but the league's overall scoring rate has shifted, or the opposing-goalie quality distribution has changed. The validator now flags this. The most recent calibration tweak, raising the hard top-probability ceiling from 0.85 to 0.95, was driven by exactly this kind of drift.
API throttling will eat you alive. The early version of this pipeline fetched every player's stats from scratch in every model. With six models running, that meant six separate hits on the same NHL endpoint per player per day. The fix was a per-day shared roster cache — predictable, boring, and the single biggest reliability improvement of the project.
Real shots beat synthetic shots by a lot. The xG model originally generated 20 fake shots per player from a position-based profile and scored those. Replacing that with the player's real recent shots from play-by-play data was the biggest predictive lift the site has had — and the simplest technical change. The fanciest model on this site is not the neural network with embeddings; it's the one that finally got to look at real data.
What I haven't built yet
Three deferred upgrades, in roughly the order I expect to tackle them:
- Multi-season training. Every model on the site is retrained from the current season only. Adding a few prior seasons of play-by-play data would help the xG and neural models in particular — more shots, more strength-state coverage, more rare-event examples like 3v3 overtime goals. The bottleneck is that the NHL's API has changed shot-tracking format across seasons and merging cleanly is more work than it sounds.
- Line combinations and pre-game deployment. Knowing in advance who is on each team's top line, who is on PP1, and who is matched against whom would push every model's accuracy up materially. The data exists publicly via beat reporters and projected lines sites; ingesting it reliably is a different problem.
- Neural sequence model. A model that treats a player's shot history as a sequence rather than as an unordered bag could capture form trajectories and matchup-specific patterns that the current architecture flattens away. This is the speculative one — I am not confident the dataset is large enough to support it without overfitting.
None of these are urgent. The pipeline as it stands runs reliably, the meta-ensemble is calibrated, and the daily output is good enough that I find it interesting most nights. The deferred list exists so that when I do come back to expand the project, I am not picking from feel.
Where to read more
- The full source is at github.com/mghnasiri.
- The methodology page goes deep on each model's features, formulas, training, and conversion to per-game probability.
- Tonight's predictions live on the dashboard, refreshed every morning.
- For the privacy and data-handling details, see the privacy policy.
Last updated 2026-05-03. Model schedule and feature lists drift as the project evolves; this page reflects the current pipeline, not its historical state.