Skip to main content

Personalisation & Recommendation

Per-user MNL · Contextual bandits

Choose what to show which customer to maximise expected revenue (or CLV) — a per-user assortment problem layered on top of a predicted utility model. When utilities are known, the revenue-ordered MNL property (Talluri & van Ryzin 2004) still applies per user. When they must be learned online, the problem becomes a contextual bandit (Li, Chu, Langford & Schapire 2010). Industry scale: Netflix, Amazon, YouTube run this loop millions of times per minute. Handbook reference: Ricci, Rokach & Shapira (eds.) Recommender Systems Handbook.

Why it matters

The digital-retail revenue lever with the highest leverage

~35%
Share of Amazon’s sales attributed to personalised recommendations — the single most-cited scale reference in recommender literature.
Source: McKinsey retail analytics (2013 estimate, later confirmed in industry filings).
~75%
Share of Netflix viewing hours driven by recommendations — the canonical success case for CF-style personalisation.
Source: Gomez-Uribe & Hunt (2016), ACM TMIS.
+5–15%
Typical e-commerce revenue lift documented when replacing popularity-sort with MNL-aware personalised ranking on the homepage.
Source: Industry papers; Gallino & Moreno analyses.
\(\tilde O(\sqrt{KT})\)
Regret of contextual-bandit algorithms like LinUCB after \(T\) rounds with \(K\) arms — the theoretical ceiling on how fast online learning converges.
Source: Li, Chu, Langford & Schapire (2010), WWW.

Where the decision sits

Each page-view, each email, each push-notification — a recommendation decision

Recommendation is a decision made at every session: homepage, search, category, product-detail, cart-add, checkout, post-purchase email, push notification. The retailer picks \(K\) items to show, the customer clicks (or doesn’t) and optionally buys, and the system learns. Two decision models matter: when utilities are known (offline batch-scored from last night’s model), use MNL assortment per user; when utilities must be learned as users arrive (new customer, cold-start item, or A/B-test phase), use a contextual bandit.

Contextuser + session
Rank / select \(K\)recommendation
Click / buy / skipcustomer response
Update modelnightly or online

Problem & formulation

Per-user MNL + contextual-bandit learning

Decision model
Per-user subset selection
Demand model
Contextual MNL / LinUCB
Complexity
O(N log N) per user (offline)
Reference
Talluri-vanRyzin; Li et al. 2010

Sets and parameters

SymbolMeaningUnit
\(u \in \mathcal{U}\)User (current visitor)finite
\(i \in \mathcal{N}\)Candidate item in the catalogfinite
\(x_u\)User context vector (features, history, device)\(\mathbb{R}^d\)
\(z_i\)Item feature vector\(\mathbb{R}^d\)
\(\theta\)Learned parameter vector\(\mathbb{R}^d\)
\(\hat u_{ui}\)Predicted utility (log-odds of click / purchase)real
\(r_i\)Revenue on purchase of item \(i\)$
\(K\)Slate size / recommendation capacityinteger

Contextual utility model

Utility is learned from interaction data. Simplest form is bilinear:

$$\hat u_{ui} \;=\; \theta^{\top} \phi(x_u, z_i) \;=\; \theta^{\top} (x_u \otimes z_i)$$

\(\phi\) is a joint user-item feature map (often an outer product). In practice, matrix factorisation \(\hat u_{ui} = p_u^{\top} q_i\) is the canonical form (Koren, Bell & Volinsky 2009). Modern practice uses deep networks on top.

Per-user MNL offline optimisation

Given the utility vector \(\hat u_{u \cdot}\), the retailer picks a size-\(K\) assortment \(S_u\) per user. Under MNL, click/purchase probability is:

$$P_i(S_u) \;=\; \frac{e^{\hat u_{ui}}}{1 + \sum_{j \in S_u} e^{\hat u_{uj}}}$$

Expected revenue per session: \(R(S_u) = \sum_{i \in S_u} r_i \cdot P_i(S_u)\). Talluri-vanRyzin revenue-ordered: sort items by \(r_i\), pick a prefix. Diversity / fairness add extra constraints.

Online learning: LinUCB

When \(\theta\) is unknown, balance exploration and exploitation. The LinUCB rule (Li et al. 2010) picks at each round the arm with the highest upper confidence bound:

$$i_t \;=\; \arg\max_i \;\Bigl[\, \hat{\theta}_t^{\top} \phi_{u_t, i} \;+\; \alpha \sqrt{\phi_{u_t, i}^{\top} A_t^{-1} \phi_{u_t, i}} \,\Bigr]$$

\(A_t = \lambda I + \sum_{s \leq t} \phi_s \phi_s^{\top}\) is the Gram matrix; \(\alpha\) tunes exploration. Regret is \(\tilde O(d \sqrt{T})\) — sub-linear, meaning per-round regret vanishes. Ties this back to dynamic pricing with learning.

Diversity / fairness constraints

Pure revenue maximisation over-concentrates: same top items to every user. Real systems add:

$$\sum_{i \in S_u} \mathbb{1}\{i \in \text{category } c\} \;\leq\; C_c \qquad \forall c \qquad \text{(category cap)}$$ $$\sum_{u, i \in S_u} \mathbb{1}\{\text{producer}(i) = p\} \;\geq\; \underline F_p \qquad \forall p \qquad \text{(producer fairness)}$$

Submodular-coverage objectives (maximum marginal relevance) are a standard way to induce diversity without hard constraints — Carbonell & Goldstein 1998.

Interactive solver

6 users × 10 items · per-user MNL assortment, offline

Personalised assortment solver
Revenue-ordered per user · compare with non-personalised baseline
★★★ Exact (MNL revenue-ordered)
Variation in user utilities
Item-price variation
Revenue (personalised, $/user)
Revenue (non-personalised, $/user)
Personalisation lift
Unique items across users
Avg price in personalised slates
Avg P(click something)
Recommended to user (high utility, chosen for slate) Not recommended (low utility or prefix skipped) High predicted utility

Under the hood

The scenario generator creates a low-rank utility matrix: each user has a latent-factor vector \(p_u \in \mathbb{R}^2\), each item has \(q_i \in \mathbb{R}^2\) and a price \(r_i\), and predicted utility is \(\hat u_{ui} = p_u^{\top} q_i\) (plus noise, scaled by “preference strength”). Per user, the solver sorts items by price, evaluates the MNL revenue of each prefix of length 1..K, and keeps the best (revenue-ordered optimum). Non-personalised baseline: sort by average utility across users, pick top-K, use that same slate for every user. Personalisation lift % is the spread between these two.

Reading the solution

Three patterns to watch for

  • Personalisation lift scales with preference heterogeneity. When all users have similar tastes, personalisation doesn’t help much (everyone wants the top items). Crank up “preference strength” and the lift widens.
  • High-price anchoring still holds per user. The revenue-ordered property is per-user: even with personalisation, the optimal slate for each user is a prefix of items sorted by price, filtered by what utility each user has for them.
  • Coverage widens with personalisation. The unique-items-count KPI rises when users get their own slates — a good proxy for fairness to long-tail items and to producers.

Sensitivity questions

  • What if I increase slate size \(K\)? — per-user revenue rises but marginally; adding a 5th item gains less than the 4th (cannibalisation under MNL).
  • What if prices narrow (price spread → 0.2)? — personalisation lift shrinks: ranking by utility converges to ranking by popularity.
  • What if I apply a cold-start user (random \(p_u\))? — the model wastes slate slots; bandit exploration beats the offline policy for such users.

Model extensions

Matrix factorisation

Classic collaborative filtering: learn \(p_u, q_i\) from past interactions to build the utility matrix. Koren-Bell-Volinsky 2009 Netflix-prize winner.

Deep-learning recommenders

Two-tower neural nets (Covington-Adams-Sargin 2016 YouTube), transformers (BERT4Rec), and multimodal embeddings. Production standard at hyperscalers.

Contextual bandits (LinUCB / Thompson)

Online learning when utilities are unknown: explore slates to reduce uncertainty; LinUCB (Li et al. 2010), Thompson sampling (Agrawal-Goyal 2013).

Diversity via MMR / DPP

Induce slate diversity via Maximum Marginal Relevance (Carbonell-Goldstein 1998) or Determinantal Point Processes (Kulesza-Taskar 2012).

CLV-aware recommendation

Objective is long-term CLV, not single-session revenue. Reinforcement-learning framing; recent research frontier.

CLV →
Assortment-under-MNL (aggregate)

Offline non-personalised assortment optimisation is the special case when all users share one utility vector.

Assortment planning →
Cross-channel personalisation

Consistent recommendations across web / app / email / push / store — joint optimisation with fatigue penalty.

Fairness + producer exposure

Long-tail items and new producers need exposure; add lower-bound constraints or post-hoc re-ranking. Biega-Gummadi-Weikum 2018.

Key references

Ricci, F., Rokach, L. & Shapira, B. (eds.) (2011, 3rd ed. 2022).
Recommender Systems Handbook.
Springer. doi:10.1007/978-1-0716-2197-4 (The field’s standard handbook.)
Koren, Y., Bell, R. & Volinsky, C. (2009).
Matrix factorization techniques for recommender systems.
IEEE Computer 42(8): 30–37. doi:10.1109/MC.2009.263 (Netflix-prize canonical paper.)
Li, L., Chu, W., Langford, J. & Schapire, R. E. (2010).
A contextual-bandit approach to personalized news article recommendation.
WWW 2010. doi:10.1145/1772690.1772758 (LinUCB.)
Talluri, K. T. & van Ryzin, G. J. (2004).
Revenue management under a general discrete choice model.
Management Science 50(1): 15–33. doi:10.1287/mnsc.1030.0147
Agrawal, S. & Goyal, N. (2013).
Thompson sampling for contextual bandits with linear payoffs.
ICML 2013.
Carbonell, J. & Goldstein, J. (1998).
The use of MMR, diversity-based reranking for reordering documents and producing summaries.
SIGIR 1998.
Kulesza, A. & Taskar, B. (2012).
Determinantal point processes for machine learning.
Foundations and Trends in Machine Learning 5(2-3). doi:10.1561/2200000044
Gomez-Uribe, C. A. & Hunt, N. (2016).
The Netflix recommender system: Algorithms, business value, and innovation.
ACM Transactions on Management Information Systems 6(4): 1–19. doi:10.1145/2843948
Covington, P., Adams, J. & Sargin, E. (2016).
Deep neural networks for YouTube recommendations.

Back to the retail domain

Personalisation sits in the Promotion × Operational cell — millions of micro-decisions per minute, powering the 4th P with algorithms.

Open Retail Landing
Educational solver · low-rank synthetic utilities, offline MNL · production systems layer learning (bandits, transformers, fairness) on top.