Glossary · NHL Goal Predictor

Hockey, machine learning, and probability — defined

Every term used in the methodology and across the dashboard, defined in plain English. Hockey terms first, then machine learning, then probability and evaluation. If a term in the predictor confused you, it should be here.

Sections

Hockey terms Machine learning Probability & evaluation Project-specific

Hockey terms

Anytime goal scorerprop: A sportsbook bet that pays if the named player scores at least one goal at any point in the game (regulation, overtime, or shootout). Different from "first goal scorer" or "exact number of goals." This site's Market Odds model uses anytime-scorer prices because they line up cleanly with the binary "did the player score" target.
Empty-net goalEN: A goal scored when the opposing team has pulled their goalie, usually for an extra attacker in the final 1–2 minutes when trailing. The xG model has an explicit is_empty_net feature because the conditional probability of any shot going in roughly triples in that state.
Even strengthEV / 5v5: Both teams have the same number of skaters on the ice, typically five-on-five. Most goals happen at even strength, but per-shot goal probability is meaningfully lower here than on the power play because the defense has full coverage.
Goals against per gameGA/GP: How many goals a team allows per game on average across the season. Used as the opponent factor in the Monte Carlo and xG models — facing a team that allows 3.5 goals per game (vs the league average ~3.07) bumps the expected scoring environment by roughly 14%.
Goals for per gameGF/GP: How many goals a team scores per game on average. Used in Monte Carlo as the team-offense factor that multiplies into the team's nightly expected goal total.
High-danger chance: A shot taken from a high-probability area, typically defined as inside the slot at a sharp-enough angle. The xG model approximates this as shot_distance < 30 ft AND shot_angle < 30° via the engineered is_high_danger binary feature.
Hot streak / form: An above-average recent scoring rate. The Monte Carlo model gives a small, bounded boost (max +30%) to players whose last-five-games goals-per-game is meaningfully higher than their season average. The xG model has its own asymmetric form blend (40% recent, 60% season).
Penalty killPK: The defensive special-teams unit on the ice when a team is short-handed. PK goals are rare events with very different shot distributions than even-strength goals; the xG model handles this via the one-hot strength_state feature.
Power playPP / 5v4 / 5v3: A team has more skaters than the opponent because of a penalty. Goal-scoring rates are roughly 2–3× higher per shot here than at even strength. The Monte Carlo model includes a small power-play bonus weighted by each player's PP-goals share.
Save percentageSV%: Goalie metric: the fraction of shots faced that did not go in. Used as goalie_sv_pct in the Meta Ensemble's feature set, because the identity of the goalie tonight materially shifts every shooter's probability of scoring.
Slot shot: A shot taken from the central area in front of the net (typically inside ~20 feet). The xG model encodes this as a binary is_slot_shot feature; the model learns these convert at much higher rates than perimeter shots.
Strength state5v5, 5v4, 4v3, 3v3, 6v5...: The skater-count matchup at the moment of a shot. Common states: 5v5 (even strength), 5v4 (power play), 4v5 (penalty kill), 3v3 (overtime), 6v5 (extra attacker). The xG and Neural v2 models one-hot encode all observed strength states because each has a meaningfully different scoring environment.
Time on iceTOI: Total minutes a player skates in a game (or season-average per game). The Lineup TOI v1 model uses recent vs season TOI ratio to detect scratches and lineup demotions that pure scoring history misses.

Machine learning

Embedding: A learnable dense vector (32 dimensions in Neural v2) associated with a discrete entity — here, an NHL player. The embedding lets the network represent each player as a point in a 32-dim space whose coordinates are learned during training, capturing tendencies that aren't expressible from shot-level features alone.
Feature engineering: Manually constructing input variables from raw data. distance × angle, is_slot_shot, and is_high_danger in the xG model are engineered features — they are not raw NHL API outputs, but transformations the modeler chose because they improve fit.
Hyperparameters: Knobs you set before training that aren't learned from the data: tree depth, learning rate, number of estimators, batch size, dropout rate. The xG XGBoost model uses 200 estimators at depth 5 with learning rate 0.1; the Neural v2 model uses Adam at 3e-4 with batch size 256.
Isotonic regression: A non-parametric monotonic function-fitter that maps raw model probabilities to calibrated frequencies. Given pairs of (predicted_prob, actual_outcome), it produces a step-wise function that's the closest non-decreasing fit. The Meta Ensemble uses this to correct miscalibration after the LightGBM model is trained.
LightGBM: A gradient-boosted decision tree library by Microsoft, similar to XGBoost but typically faster on large datasets. The Meta Ensemble uses LightGBM as the stacked classifier. Tree-based methods are well suited to the meta-learning problem because the input features (base-model probabilities, disagreement, goalie quality) interact non-linearly.
MLP / Multi-layer perceptron: A standard feed-forward neural network with one or more hidden layers. Neural MLP v1 uses sklearn's MLPClassifier on aggregate player features; Neural Embed v2 uses a custom PyTorch MLP with player embeddings and residual connections.
One-hot encoding: Representing a categorical variable (like shot_type) as a set of binary columns, one per category, with exactly one set to 1. The XGBoost xG model one-hot encodes 11 shot types and ~15 strength states because tree models can split on these binary indicators directly.
Overfitting: When a model learns idiosyncrasies of the training set that don't generalize. Symptoms: training accuracy continues to rise while validation accuracy stalls or drops. Defenses on this site: capped tree depth (5), early stopping (patience 5 epochs), held-out calibration set (last 20% of training data).
Random forest: An ensemble of decision trees, each trained on a random subsample of data and features. Not directly used on this site but mentioned for context — XGBoost and LightGBM are gradient-boosted variants of the same general family.
Residual block: A neural network module that adds its input to its output (the "skip connection"). Neural v2 stacks three of these, each containing LayerNorm → Linear → ReLU → Dropout → Linear. The skip connection makes deep networks trainable by preventing gradient signal from vanishing through too many layers.
Stacked ensemble / meta-learner: A model that takes other models' predictions as input features. The Meta Ensemble is a classic stack: it sees p_linear, p_mc, p_xg, p_market from the four base models and learns a re-weighting that beats any individual model. Stacking works when base models have uncorrelated errors.
StandardScaler: A preprocessing step that subtracts each feature's mean and divides by its standard deviation, so all features end up on the same scale. Important for neural networks (which are scale-sensitive) but irrelevant for tree models (which are not). Neural v1 uses one; xG XGBoost does not.
XGBoost: The most-used gradient-boosted trees library. Works by adding decision trees one at a time, each correcting the previous trees' residuals. The xG XGBoost v3 model uses 200 trees at max depth 5 with learning rate 0.1 — typical "sane defaults" for a binary classification target on tabular data.

Probability and evaluation

AUC / ROC curvearea under curve: A ranking metric: the probability that a randomly chosen positive example is ranked above a randomly chosen negative one. The xG model's training metadata reports CV AUC ≈ 0.93. AUC is invariant to threshold choice and class imbalance, which is why it's the standard for shot-quality models.
Brier score: Mean squared error between predicted probabilities and binary outcomes: (p − y)² averaged over all predictions. Lower is better. Unlike accuracy, Brier rewards confident-and-correct predictions and punishes confident-and-wrong ones — a model that says 90% and is right gains a lot more than one that says 51% and is right.
Calibration: The property that predicted probabilities match actual frequencies. A perfectly calibrated model that predicts 20% for a thousand events should see the event happen ≈ 200 times. The Meta Ensemble explicitly calibrates its output via isotonic regression because raw stacked-model probabilities are typically miscalibrated.
Calibration curve / reliability diagram: A plot of predicted probability buckets (x-axis) against observed frequency in each bucket (y-axis). A perfectly calibrated model traces the diagonal. The validator on this site computes these for every model and flags the ensemble when its curve drifts off the diagonal.
Cross-validationCV: Splitting training data into folds, training on all-but-one and evaluating on the held-out one, repeated for each fold. Standard practice for hyperparameter selection. Time-aware variants (see time-series split) are necessary when temporal ordering matters — which it does here, because predictions made today wouldn't see future games during deployment.
Devig / removing the vig: Sportsbook prices include a margin (the "vig" or "juice") that makes implied probabilities sum to slightly more than 100%. Devigging is the process of normalizing prices to true probability estimates. The Market Odds model considered several devig schemes and uses raw averaged implied probabilities in production because the ranking is preserved either way.
Hit rate / Top-N accuracy: The fraction of nights on which the actual top-N goal scorers include a player the model also ranked in its top-N. The dashboard's "Yesterday's Top 10" panel is a hit-rate display: of yesterday's top-10 model picks, how many actually scored?
Implied probability: The probability implicit in a sportsbook price. American odds → implied probability conversion: 100 / (odds + 100) for positive odds, abs(odds) / (abs(odds) + 100) for negative odds. A +200 price implies 33.3%; a −150 price implies 60%.
Log loss / cross-entropy: The standard probabilistic loss function for binary classification: −[y·log(p) + (1−y)·log(1−p)] averaged over predictions. Heavily penalizes confident-but-wrong predictions (a 99% prediction that's wrong dominates the loss). Used as the LightGBM training objective in the Meta Ensemble.
Monte Carlo simulation: Estimating a probability or distribution by running many random simulations and averaging the result. The Monte Carlo v2 model runs 10,000 simulated games per matchup; each simulation is a Poisson draw for total goals followed by weighted random assignment of each goal to a player. The probability output is just the fraction of sims in which the player scored.
Poisson distribution: The probability distribution of the count of independent rare events per fixed unit (e.g. goals per game). Parameterized by a single rate λ: P(k events) = λ^k · e^(−λ) / k!. The probability of zero events is e^(−λ), so the probability of at least one is 1 − e^(−λ) — the formula every xG-based model on this site uses to convert expected goals into a per-game probability.
Time-series split: A train/test split that respects temporal ordering: earlier dates train, later dates evaluate. Necessary for any model deployed to predict future events from historical data, because random splits leak future information. The Meta Ensemble's last 20% of data (chronologically) is used for calibration, not training.

Project-specific terms

Consensus pick: The dashboard's headline output: the player whose average goal probability across all available models is highest within a matchup group. Designed to be more robust to any single model's outlier than the meta-ensemble's top pick alone.
Hard top-prob ceiling: A safety check in the validator: any model output above a fixed threshold (currently 0.95) is flagged as suspect. This catches the failure mode where a stacked model produces an obviously wrong > 95% probability when one base predictor returns extreme values.
Health report: A nightly JSON file at data/health_report.json generated by the validator. Records each model's recent calibration, top-N hit rate, and any flagged anomalies. The dashboard uses it to show a "model status" indicator next to each model's name.
Tim Hortons Hockey Challenge: A fan game published by Tim Hortons (the Canadian coffee chain). Each NHL game day, three groups of five skaters are presented; players pick one per group and earn points based on how many goals + assists their picks accumulate. The dashboard's "Tim Hortons Challenge Picks" panel shows tonight's groups with the meta-ensemble's recommended pick from each.
Tims group rankings: Within each Tim Hortons group of five, the model-internal ranking of those five players by predicted goal probability. The pick is just whoever ranks first within the group, not necessarily the highest-probability player overall.
xG source tag: A field on every xG-based prediction (real_shots, position_prior, or synthetic) recording which data source produced the per-shot xG estimate. Used in A/B analysis to break down hit rate by data quality — predictions tagged real_shots are noticeably more accurate than synthetic ones, which is the entire reason the real-shot pipeline exists.

Last updated 2026-05-03. New terms get added as the project introduces them; if a term in the dashboard or methodology is missing here, it's an oversight.