Personalisation & Recommendation
Per-user MNL · Contextual bandits
Choose what to show which customer to maximise expected revenue (or CLV) — a per-user assortment problem layered on top of a predicted utility model. When utilities are known, the revenue-ordered MNL property (Talluri & van Ryzin 2004) still applies per user. When they must be learned online, the problem becomes a contextual bandit (Li, Chu, Langford & Schapire 2010). Industry scale: Netflix, Amazon, YouTube run this loop millions of times per minute. Handbook reference: Ricci, Rokach & Shapira (eds.) Recommender Systems Handbook.
Why it matters
The digital-retail revenue lever with the highest leverage
Where the decision sits
Each page-view, each email, each push-notification — a recommendation decision
Recommendation is a decision made at every session: homepage, search, category, product-detail, cart-add, checkout, post-purchase email, push notification. The retailer picks \(K\) items to show, the customer clicks (or doesn’t) and optionally buys, and the system learns. Two decision models matter: when utilities are known (offline batch-scored from last night’s model), use MNL assortment per user; when utilities must be learned as users arrive (new customer, cold-start item, or A/B-test phase), use a contextual bandit.
Problem & formulation
Per-user MNL + contextual-bandit learning
Sets and parameters
| Symbol | Meaning | Unit |
|---|---|---|
| \(u \in \mathcal{U}\) | User (current visitor) | finite |
| \(i \in \mathcal{N}\) | Candidate item in the catalog | finite |
| \(x_u\) | User context vector (features, history, device) | \(\mathbb{R}^d\) |
| \(z_i\) | Item feature vector | \(\mathbb{R}^d\) |
| \(\theta\) | Learned parameter vector | \(\mathbb{R}^d\) |
| \(\hat u_{ui}\) | Predicted utility (log-odds of click / purchase) | real |
| \(r_i\) | Revenue on purchase of item \(i\) | $ |
| \(K\) | Slate size / recommendation capacity | integer |
Contextual utility model
Utility is learned from interaction data. Simplest form is bilinear:
\(\phi\) is a joint user-item feature map (often an outer product). In practice, matrix factorisation \(\hat u_{ui} = p_u^{\top} q_i\) is the canonical form (Koren, Bell & Volinsky 2009). Modern practice uses deep networks on top.
Per-user MNL offline optimisation
Given the utility vector \(\hat u_{u \cdot}\), the retailer picks a size-\(K\) assortment \(S_u\) per user. Under MNL, click/purchase probability is:
Expected revenue per session: \(R(S_u) = \sum_{i \in S_u} r_i \cdot P_i(S_u)\). Talluri-vanRyzin revenue-ordered: sort items by \(r_i\), pick a prefix. Diversity / fairness add extra constraints.
Online learning: LinUCB
When \(\theta\) is unknown, balance exploration and exploitation. The LinUCB rule (Li et al. 2010) picks at each round the arm with the highest upper confidence bound:
\(A_t = \lambda I + \sum_{s \leq t} \phi_s \phi_s^{\top}\) is the Gram matrix; \(\alpha\) tunes exploration. Regret is \(\tilde O(d \sqrt{T})\) — sub-linear, meaning per-round regret vanishes. Ties this back to dynamic pricing with learning.
Diversity / fairness constraints
Pure revenue maximisation over-concentrates: same top items to every user. Real systems add:
Submodular-coverage objectives (maximum marginal relevance) are a standard way to induce diversity without hard constraints — Carbonell & Goldstein 1998.
Interactive solver
6 users × 10 items · per-user MNL assortment, offline
Under the hood
The scenario generator creates a low-rank utility matrix: each user has a latent-factor vector \(p_u \in \mathbb{R}^2\), each item has \(q_i \in \mathbb{R}^2\) and a price \(r_i\), and predicted utility is \(\hat u_{ui} = p_u^{\top} q_i\) (plus noise, scaled by “preference strength”). Per user, the solver sorts items by price, evaluates the MNL revenue of each prefix of length 1..K, and keeps the best (revenue-ordered optimum). Non-personalised baseline: sort by average utility across users, pick top-K, use that same slate for every user. Personalisation lift % is the spread between these two.
Reading the solution
Three patterns to watch for
- Personalisation lift scales with preference heterogeneity. When all users have similar tastes, personalisation doesn’t help much (everyone wants the top items). Crank up “preference strength” and the lift widens.
- High-price anchoring still holds per user. The revenue-ordered property is per-user: even with personalisation, the optimal slate for each user is a prefix of items sorted by price, filtered by what utility each user has for them.
- Coverage widens with personalisation. The unique-items-count KPI rises when users get their own slates — a good proxy for fairness to long-tail items and to producers.
Sensitivity questions
- What if I increase slate size \(K\)? — per-user revenue rises but marginally; adding a 5th item gains less than the 4th (cannibalisation under MNL).
- What if prices narrow (price spread → 0.2)? — personalisation lift shrinks: ranking by utility converges to ranking by popularity.
- What if I apply a cold-start user (random \(p_u\))? — the model wastes slate slots; bandit exploration beats the offline policy for such users.
Model extensions
Matrix factorisation
Classic collaborative filtering: learn \(p_u, q_i\) from past interactions to build the utility matrix. Koren-Bell-Volinsky 2009 Netflix-prize winner.
Deep-learning recommenders
Two-tower neural nets (Covington-Adams-Sargin 2016 YouTube), transformers (BERT4Rec), and multimodal embeddings. Production standard at hyperscalers.
Contextual bandits (LinUCB / Thompson)
Online learning when utilities are unknown: explore slates to reduce uncertainty; LinUCB (Li et al. 2010), Thompson sampling (Agrawal-Goyal 2013).
Diversity via MMR / DPP
Induce slate diversity via Maximum Marginal Relevance (Carbonell-Goldstein 1998) or Determinantal Point Processes (Kulesza-Taskar 2012).
CLV-aware recommendation
Objective is long-term CLV, not single-session revenue. Reinforcement-learning framing; recent research frontier.
CLV →Assortment-under-MNL (aggregate)
Offline non-personalised assortment optimisation is the special case when all users share one utility vector.
Assortment planning →Cross-channel personalisation
Consistent recommendations across web / app / email / push / store — joint optimisation with fatigue penalty.
Fairness + producer exposure
Long-tail items and new producers need exposure; add lower-bound constraints or post-hoc re-ranking. Biega-Gummadi-Weikum 2018.
Key references
Back to the retail domain
Personalisation sits in the Promotion × Operational cell — millions of micro-decisions per minute, powering the 4th P with algorithms.
Open Retail Landing