About the project

Eight models in a Quebec apartment

This project began as a private spreadsheet for the Tim Hortons Hockey Challenge and ended up as a daily-publishing prediction stack with eight models, a stacked meta-ensemble, and a calibration validator that tells me when my own models are lying. This page is the story of how it got there, what runs every morning, and what I have not yet built.

Why this exists

The Tim Hortons Hockey Challenge gives players a list of three groups of five NHL skaters per night and asks you to pick one player from each group whose total goals + assists for the night are higher than the players you didn't pick. It is mostly a casual game. It is also, structurally, a forced-choice ranking task — exactly the shape of problem statistics actually gives you traction on.

The first version of this project lived in a Google Sheet. I was averaging a few features by hand, scoring the player pool, and submitting picks. That worked badly enough — and was tedious enough — that I rewrote it as a Python script. The script worked well enough that I added another model. Then the second model disagreed with the first model often enough that I added a third to break ties. Then I noticed the first model was actually better than the third on certain matchups, and the only way to learn which matchups was to keep both models and measure. Roughly that pattern, repeated, is how I ended up with eight.

The whole point of running this many models is that no single approach to this problem dominates. A market-implied probability is great information but is silent on a third of the night's games where the books haven't posted. A pure xG model is sharp on shot quality but blind to whether a guy was a healthy scratch yesterday. A Monte Carlo simulation captures squad-level uncertainty but is mediocre at picking between two depth-line wingers. Each model fills in where another is weak.

The eight models, one paragraph each

Market Odds v1 averages anytime-goal-scorer odds across major US sportsbooks and converts them to probabilities. Strongest single model on most nights; missing entirely on others.

Weighted Linear v1 is a hand-tuned formula on season averages — the dumb baseline whose job is to embarrass the fancy models when they don't beat it. It often does.

Monte Carlo v2 simulates each game 10,000 times: Poisson-draw the team's goals, then assign each goal to a player weighted by season scoring rate raised to the 1.8 power. The exponent is the trick — it captures how much more disproportionately stars score than their averages alone suggest.

xG XGBoost v3 is a per-shot expected-goals model trained on play-by-play data and converted to a per-game probability via the player's expected shot count. The conversion is a Poisson assumption: P(≥1 goal) = 1 − e^(−total_xG).

Lineup TOI v1 is the same model as xG XGBoost v3, with one extra step that scales expected shots by the ratio of the player's recent ice time to their season average. It catches the cases where everyone else is still ranking a player who got benched yesterday.

Neural MLP v1 is a small scikit-learn neural network that re-ranks the Monte Carlo output using player-level aggregate features. It learns where Monte Carlo is consistently too generous or too stingy.

Neural Embed v2 is a PyTorch MLP with learnable 32-dimensional player embeddings on top of shot-level features. It's the only model on the site that can represent "this defenseman's shots go in more often than the geometry alone predicts."

Meta Ensemble v1 stacks four base models (linear, MC, xG, market) plus disagreement, goalie, and matchup features into a LightGBM classifier with isotonic calibration on top. This is the model the site treats as canonical.

The technical writeup of every model — features, formulas, hyperparameters, failure modes — lives in methodology.

What the daily pipeline actually does

Every morning, a chain of GitHub Actions cron jobs runs, scheduled around when the day's matchups stabilize and lineup signals start to firm up. Times below are UTC; the schedule is timed to morning Eastern.