Frequently asked questions

Q: Is this betting advice?

No. The NHL Goal Predictor publishes probability estimates for educational and entertainment purposes only. It does not recommend bets, account for sportsbook vig, or model bankroll variance. Whether any probability estimate is profitable against a market line depends on factors this project does not estimate.

Q: How accurate is the predictor?

Accuracy varies by model and metric. The Meta Ensemble achieves a top-10 daily hit rate (the model's top-10 picks include at least one actual scorer) substantially above random, with calibration enforced via isotonic regression. The validator publishes per-model accuracy in the daily health report.

Q: Where does the data come from?

Player and game data come from the public NHL Stats API at api-web.nhle.com. Sportsbook implied probabilities come from The Odds API. All pre-game shot training data comes from NHL play-by-play feeds. No proprietary data sources are used.

Q: Why eight models?

No single approach dominates this problem. Each model fills in where another is weak: market odds embed lineup information but aren't always available; xG models capture shot quality but miss deployment; Monte Carlo handles squad-level uncertainty; the meta-ensemble combines them with calibration. Running them in parallel is more robust than any single model.

Question 1

What does this site actually do?

Answer

Every morning, eight different statistical and machine-learning models score every NHL skater playing tonight on their probability of scoring at least one goal. The models are described in detail on the methodology page. The dashboard publishes the consensus picks, the per-model rankings, and a daily validator report on how the models did yesterday.

Question 2

Is this betting advice?

Answer

No. A probability estimate is just an estimate. Whether it is "profitable" against a sportsbook line depends on the line, the bookmaker's vig, your bankroll variance, and ultimately whether the estimate is closer to the truth than the line is — none of which this site models.

The site exists because predicting goal scorers is an interesting machine-learning problem in its own right, and because I play the Tim Hortons Hockey Challenge and wanted a defensible way to make my picks. It is not a betting tool, and the predictions are not optimized to be profitable against bookmaker prices.

Question 3

How accurate is the predictor?

Answer

Two ways to answer this. First, ranking accuracy: the Meta Ensemble's top-10 picks include at least one actual goal scorer on the large majority of game nights — a result I track in the daily health report. Second, calibration: the predicted probabilities are run through isotonic regression so that, for example, players the model predicts at 20% actually score about 20% of the time over a large enough window. Both metrics drift with seasonal scoring environment changes and are monitored by the validator.

The honest answer is "well above random, well below perfect, and with material variance night-to-night." Don't take a single night's results as evidence either way.

Question 4

Where does the data come from?

Answer

All player stats, rosters, schedules, and per-game results come from the NHL's public Stats API at api-web.nhle.com. Sportsbook implied probabilities come from The Odds API. Per-shot training data for the xG and neural models comes from NHL play-by-play feeds, processed nightly into data/xg_training/shots.csv. No proprietary or paid data sources are used.

Question 5

Why eight models? Wouldn't one good one be enough?

Answer

Because no single approach to this problem dominates. The market is the strongest single signal — except on nights when the books haven't priced a game. The xG model is sharp on shot quality — except when a player has been scratched. The Monte Carlo simulation captures squad-level uncertainty — except that it can't tell two depth-line wingers apart. Each model has a regime where it shines and a regime where it fails. Running them in parallel and stacking the results into a calibrated ensemble is meaningfully more robust than any single model.

Question 6

Why not just use the sportsbook odds directly?

Answer

A reasonable question — Market Odds v1 is genuinely the strongest single model on the site. But sportsbook lines have three weaknesses the other models compensate for. First, the books don't post anytime-scorer prices for every player in every game, so on slow nights the market line covers maybe two-thirds of the slate. Second, prices include a vig (bookmaker margin) that systematically inflates implied probabilities. Third, the books occasionally underprice rookies and call-ups whose talent is genuinely mispriced — the kind of edge a stats model can find before the market reprices.

Question 7

What's the difference between the consensus pick and the meta-ensemble's top pick?

Answer

The meta-ensemble is a single trained model (LightGBM + isotonic calibration) that learns to weight base models adaptively. The consensus pick on the dashboard is the simple average of all available models' probabilities for a given player. The two often agree but not always — the consensus is more robust when one base model produces an outlier, while the meta-ensemble is more robust when the right answer requires actively favoring one base model over the others.

Question 8

Why does Connor McDavid (or any star) not always rank #1?

Answer

Goal probability is not the same as goal-scoring talent. The right question every night is "which player has the best chance to score tonight, given who they're playing, who's in net, and how their lineup deploys them?" A franchise center playing against a tough shutdown defense pair on a road game with a hot opposing goalie can have a lower probability than a depth player on a top line at home against a leaky team. The models are designed to surface exactly this kind of contextual edge.

Question 9

How often is the dashboard updated?

Answer

Predictions update once per game day, in the morning Eastern, after the NHL has confirmed schedules and most teams have posted lineup signals. The full schedule is on the project about page. Predictions are not live-updated as the day progresses — late lineup changes after the morning run won't be reflected until the next day's pipeline.

Question 10

Why aren't there predictions for every game?

Answer

A few reasons can lead to a player or game showing up empty: the NHL API can be slow or throttled in the morning window (the pipeline retries with exponential backoff but eventually gives up); a player may not have enough season data to clear the minimum-games threshold (currently 3 games played); the sportsbook may not have posted anytime-scorer prices for that game yet, in which case the Market Odds model produces no row. The meta-ensemble can still produce a prediction without market input, but the prediction will be tagged as having lower confidence.

Question 11

What happens during the playoffs?

Answer

The pipeline handles regular-season and playoff game types identically — both are filtered in via NHL gameType codes 2 (regular) and 3 (playoff). What changes structurally in playoffs is that fewer games are played per night, so the slate is sparser, and the Tim Hortons Hockey Challenge typically pauses for the postseason. Model accuracy in playoffs is also slightly different because shooting percentages and scoring environments shift — the validator tracks per-game-type calibration separately.

Question 12

What about injured players or game-time scratches?

Answer

The Lineup TOI v1 model is the project's main defense against scratches: it scales each player's expected shots by the ratio of their recent ice time to their season average, so a player who was scratched yesterday gets pulled toward zero on the shot side. A player ruled out before the morning run won't appear at all. A player ruled out after the morning run will still appear in the predictions until the next day's pipeline; the dashboard shows the timestamp of the most recent run so you can tell how fresh the data is.

Question 13

Is the source code open?

Answer

Yes. The full prediction pipeline, training scripts, validators, and dashboard are public on GitHub at github.com/mghnasiri. You can clone the repo and run the daily pipeline locally — it requires Python 3, the dependencies in requirements.txt, an Odds API key (free tier works), and a few hours of API time to backfill shot data on first run.

Question 14

Can I use this in a fantasy hockey league?

Answer

The model output is a single-game goal probability, which is one input among many in fantasy hockey. It's reasonable as a tiebreaker between two close lineup decisions on a given night. It is not useful for season-long ranking, since a player's nightly probability depends heavily on opponent and matchup, not just inherent talent.

Question 15

Why does the site use Google AdSense?

Answer

Honestly, mostly out of curiosity — running an AdSense-eligible site is a learning exercise in itself, and traffic on a personal academic site is small enough that ad revenue is essentially zero. The few ad slots in the dashboard are placed where they don't interfere with the actual prediction tables, and the site uses Google Funding Choices for IAB-TCF v2 compliant consent in regions that require it (EU, UK, Switzerland).

Question 16

What's the privacy policy?

Answer

The site uses Google Analytics for traffic measurement and Google AdSense for advertising. No personally identifying information is collected by the prediction pipeline itself. Full details are in the privacy policy.

Question 17

Who built this and why?

Answer

Built by Mohammad G. Nasiri, a PhD student in Operations and Decision Systems at Université Laval in Quebec City. The project began as a private tool for the Tim Hortons Hockey Challenge and grew into a full prediction stack as I added models that fixed each previous version's weaknesses. The longer story is on the about page.

Frequently asked questions

The basics

Models and methodology

Operations and reliability

About the project