Skip to main content

FAQ · NHL Goal Predictor

Frequently asked questions

What this site is, what it isn't, and the answers to the questions I get most often. If your question isn't here, the methodology page goes deeper on every model.

Important disclaimer

The NHL Goal Predictor is a personal data-science project published for educational and entertainment purposes. It is not betting advice, not a financial product, and not affiliated with the NHL, any team, any sportsbook, or Tim Hortons. Probability estimates are model outputs that may be wrong on any given night.

Sports wagering, where legal, is solo responsibility. If you or someone you know has a gambling problem, contact the National Council on Problem Gambling: ncpgambling.org or call 1-800-GAMBLER.

The basics

What does this site actually do?

Every morning, eight different statistical and machine-learning models score every NHL skater playing tonight on their probability of scoring at least one goal. The models are described in detail on the methodology page. The dashboard publishes the consensus picks, the per-model rankings, and a daily validator report on how the models did yesterday.

Is this betting advice?

No. A probability estimate is just an estimate. Whether it is "profitable" against a sportsbook line depends on the line, the bookmaker's vig, your bankroll variance, and ultimately whether the estimate is closer to the truth than the line is — none of which this site models.

The site exists because predicting goal scorers is an interesting machine-learning problem in its own right, and because I play the Tim Hortons Hockey Challenge and wanted a defensible way to make my picks. It is not a betting tool, and the predictions are not optimized to be profitable against bookmaker prices.

How accurate is the predictor?

Two ways to answer this. First, ranking accuracy: the Meta Ensemble's top-10 picks include at least one actual goal scorer on the large majority of game nights — a result I track in the daily health report. Second, calibration: the predicted probabilities are run through isotonic regression so that, for example, players the model predicts at 20% actually score about 20% of the time over a large enough window. Both metrics drift with seasonal scoring environment changes and are monitored by the validator.

The honest answer is "well above random, well below perfect, and with material variance night-to-night." Don't take a single night's results as evidence either way.

Where does the data come from?

All player stats, rosters, schedules, and per-game results come from the NHL's public Stats API at api-web.nhle.com. Sportsbook implied probabilities come from The Odds API. Per-shot training data for the xG and neural models comes from NHL play-by-play feeds, processed nightly into data/xg_training/shots.csv. No proprietary or paid data sources are used.

Models and methodology

Why eight models? Wouldn't one good one be enough?

Because no single approach to this problem dominates. The market is the strongest single signal — except on nights when the books haven't priced a game. The xG model is sharp on shot quality — except when a player has been scratched. The Monte Carlo simulation captures squad-level uncertainty — except that it can't tell two depth-line wingers apart. Each model has a regime where it shines and a regime where it fails. Running them in parallel and stacking the results into a calibrated ensemble is meaningfully more robust than any single model.

Why not just use the sportsbook odds directly?

A reasonable question — Market Odds v1 is genuinely the strongest single model on the site. But sportsbook lines have three weaknesses the other models compensate for. First, the books don't post anytime-scorer prices for every player in every game, so on slow nights the market line covers maybe two-thirds of the slate. Second, prices include a vig (bookmaker margin) that systematically inflates implied probabilities. Third, the books occasionally underprice rookies and call-ups whose talent is genuinely mispriced — the kind of edge a stats model can find before the market reprices.

What's the difference between the consensus pick and the meta-ensemble's top pick?

The meta-ensemble is a single trained model (LightGBM + isotonic calibration) that learns to weight base models adaptively. The consensus pick on the dashboard is the simple average of all available models' probabilities for a given player. The two often agree but not always — the consensus is more robust when one base model produces an outlier, while the meta-ensemble is more robust when the right answer requires actively favoring one base model over the others.

Why does Connor McDavid (or any star) not always rank #1?

Goal probability is not the same as goal-scoring talent. The right question every night is "which player has the best chance to score tonight, given who they're playing, who's in net, and how their lineup deploys them?" A franchise center playing against a tough shutdown defense pair on a road game with a hot opposing goalie can have a lower probability than a depth player on a top line at home against a leaky team. The models are designed to surface exactly this kind of contextual edge.

Advertisement

Operations and reliability

How often is the dashboard updated?

Predictions update once per game day, in the morning Eastern, after the NHL has confirmed schedules and most teams have posted lineup signals. The full schedule is on the project about page. Predictions are not live-updated as the day progresses — late lineup changes after the morning run won't be reflected until the next day's pipeline.

Why aren't there predictions for every game?

A few reasons can lead to a player or game showing up empty: the NHL API can be slow or throttled in the morning window (the pipeline retries with exponential backoff but eventually gives up); a player may not have enough season data to clear the minimum-games threshold (currently 3 games played); the sportsbook may not have posted anytime-scorer prices for that game yet, in which case the Market Odds model produces no row. The meta-ensemble can still produce a prediction without market input, but the prediction will be tagged as having lower confidence.

What happens during the playoffs?

The pipeline handles regular-season and playoff game types identically — both are filtered in via NHL gameType codes 2 (regular) and 3 (playoff). What changes structurally in playoffs is that fewer games are played per night, so the slate is sparser, and the Tim Hortons Hockey Challenge typically pauses for the postseason. Model accuracy in playoffs is also slightly different because shooting percentages and scoring environments shift — the validator tracks per-game-type calibration separately.

What about injured players or game-time scratches?

The Lineup TOI v1 model is the project's main defense against scratches: it scales each player's expected shots by the ratio of their recent ice time to their season average, so a player who was scratched yesterday gets pulled toward zero on the shot side. A player ruled out before the morning run won't appear at all. A player ruled out after the morning run will still appear in the predictions until the next day's pipeline; the dashboard shows the timestamp of the most recent run so you can tell how fresh the data is.

About the project

Is the source code open?

Yes. The full prediction pipeline, training scripts, validators, and dashboard are public on GitHub at github.com/mghnasiri. You can clone the repo and run the daily pipeline locally — it requires Python 3, the dependencies in requirements.txt, an Odds API key (free tier works), and a few hours of API time to backfill shot data on first run.

Can I use this in a fantasy hockey league?

The model output is a single-game goal probability, which is one input among many in fantasy hockey. It's reasonable as a tiebreaker between two close lineup decisions on a given night. It is not useful for season-long ranking, since a player's nightly probability depends heavily on opponent and matchup, not just inherent talent.

Why does the site use Google AdSense?

Honestly, mostly out of curiosity — running an AdSense-eligible site is a learning exercise in itself, and traffic on a personal academic site is small enough that ad revenue is essentially zero. The few ad slots in the dashboard are placed where they don't interfere with the actual prediction tables, and the site uses Google Funding Choices for IAB-TCF v2 compliant consent in regions that require it (EU, UK, Switzerland).

What's the privacy policy?

The site uses Google Analytics for traffic measurement and Google AdSense for advertising. No personally identifying information is collected by the prediction pipeline itself. Full details are in the privacy policy.

Who built this and why?

Built by Mohammad G. Nasiri, a PhD student in Operations and Decision Systems at Université Laval in Quebec City. The project began as a private tool for the Tim Hortons Hockey Challenge and grew into a full prediction stack as I added models that fixed each previous version's weaknesses. The longer story is on the about page.

Last updated 2026-05-03. Have a question that should be here? The contact link is in the footer.