Skip to main content

Retail Store Location

p-Median · MCLP · Huff gravity

Choose which of \(m\) candidate sites to open as retail stores so the chain reaches its customers best. The classical formulations are p-Median (minimise weighted distance / drive-time) and Maximal Covering (MCLP; maximise demand within a threshold). The retail-specific layer is Huff's gravity model of store attraction, which turns distance into a share of each customer's spend, and the cannibalisation question of how a new store eats into the chain's existing market. Foundational papers: Hakimi (1964); Church & ReVelle (1974); Huff (1964).

Why it matters

Location drives lifetime revenue; once opened, a store stays

~20 yr
Typical retail-store lease length — location errors are amortised over two decades of operations.
Source: ICSC retail-industry averages.
~5 min
Median drive-time a grocery customer will travel. Beyond this threshold, share falls off — the classical coverage threshold in MCLP.
Source: Huff (1964); Nielsen shopper panels.
10–30%
Documented cannibalisation rate when a new store is opened near an existing one — the “chain-shadow” effect.
Source: Drèze & Ghosh (2005), Journal of Retailing.
NP-hard
Kariv & Hakimi (1979) proof; greedy + swap heuristics scale to thousands of candidate sites with near-optimal gaps.
Source: Cornuéjols, Fisher & Nemhauser (1977).

Where the decision sits

Chain network strategy · 5-20 year horizon · irreversible

Retail store location is a strategic decision: it fixes the chain's physical footprint for years and drives every downstream operation (assortment, labour, replenishment). The retail framing differs from generic facility location in three ways: (i) trade area — catchment defined by drive time, competitors, and demographics, not pure Euclidean distance; (ii) Huff attraction — demand at a customer is split across stores in proportion to a gravity score, not allocated entirely to the nearest; (iii) cannibalisation — a new store steals from the chain itself, so incremental revenue matters more than total. Pure warehouse / DC siting lives under logistics facility location; the generic p-Median family sits under Location & Covering.

Trade areademand zones
Store locationchoose \(p\) of \(m\)
Formatsize, layout
Operationslabour, assortment

Problem & formulation

Three closely-related MIPs — pick the objective that matches the retailer’s economics

OR family
Facility Location (discrete)
Complexity
NP-hard
Solver realism
★★★ Greedy-add + 1-opt swap
Reference
Hakimi (1964); Church & ReVelle (1974); Huff (1964)

Sets and parameters

SymbolMeaningUnit
\(i \in \mathcal{J}\)Candidate site (\(|\mathcal{J}| = m\))finite
\(j \in \mathcal{I}\)Demand zone / customer cluster (\(|\mathcal{I}| = n\))finite
\(w_j\)Demand weight at zone \(j\) (population × per-capita spend)$ / yr
\(d_{ij}\)Drive-time distance from site \(i\) to zone \(j\)min or km
\(D\)Coverage thresholdmin or km
\(p\)Number of stores to openinteger

Decision variables

SymbolMeaningDomain
\(y_i\)1 if site \(i\) opened as a storebinary
\(x_{ij}\)1 if zone \(j\) assigned to store \(i\)binary

p-Median: minimise weighted drive-time

$$\min \; \sum_{i \in \mathcal{J}} \sum_{j \in \mathcal{I}} w_j \, d_{ij} \, x_{ij}$$ $$\text{s.t.} \quad \sum_{i} x_{ij} = 1 \;\;\forall j, \quad x_{ij} \leq y_i \;\;\forall i, j, \quad \sum_{i} y_i = p, \quad x_{ij}, y_i \in \{0, 1\}$$

Each zone is assigned to one open store; exactly \(p\) stores are opened. Hakimi (1964) proved optima lie at vertices on a graph; Kariv & Hakimi (1979) proved NP-hardness.

Maximal Covering (MCLP): maximise covered demand

$$\max \; \sum_{j \in \mathcal{I}} w_j \, z_j \qquad \text{s.t.} \quad z_j \leq \sum_{i : d_{ij} \leq D} y_i \;\;\forall j, \quad \sum_{i} y_i = p, \quad y_i, z_j \in \{0, 1\}$$

\(z_j = 1\) iff zone \(j\) is within threshold \(D\) of at least one open store. MCLP maximises reach rather than travel cost — natural when the retailer cares about coverage. Church & ReVelle (1974).

Huff gravity attraction

Huff (1964) shifts from “nearest-store wins” to probabilistic share: zone \(j\)'s probability of shopping at store \(i\) depends on store attractiveness \(A_i\) (size, brand, parking) and distance:

$$P_{ij} \;=\; \frac{A_i \, d_{ij}^{-\gamma}}{\sum_{k: y_k = 1} A_k \, d_{kj}^{-\gamma}}$$

\(\gamma\) is the distance-decay parameter (typically 1-3). Expected sales at store \(i\) is \(\sum_j w_j P_{ij}\). Cannibalisation emerges endogenously: opening a new store near an existing one steals share in proportion to its own attractiveness. Joint optimisation over \(y\) is nonlinear; practice approximates with p-Median and audits cannibalisation afterwards.

Real-world → OR mapping

On the real-estate committeeIn the model
Candidate property (retail park, mall, high-street, greenfield)\(i \in \mathcal{J}\)
Population cluster / census tract with demographic profile\(j \in \mathcal{I}\)
Trade-area demand (households × spend / week)\(w_j\)
Drive-time from routing engine\(d_{ij}\)
“5-minute rule” coverage threshold\(D\)
Capex budget in stores\(p\)
Open / don’t open this site\(y_i\)
Store size / brand / fascia\(A_i\)
Forecast store sales\(\sum_j w_j P_{ij}\)

Interactive solver

p-Median via greedy-add + 1-opt swap local search

Retail store siting
Scenario geometry · choose \(p\) stores to minimise weighted drive-time
★★★ Greedy-add + 1-opt
Demand geography
Properties to consider
Census clusters
Network size
Reproducible geometry
Total weighted drive-time
Avg drive-time per demand unit
Demand within 5-min threshold
Worst zone drive-time
1-opt swaps applied
Lift vs centre-of-mass baseline
Open store (with 5-min ring) Candidate (not opened) Demand zone (size = \(w_j\)) Zone → nearest-store assignment

Under the hood

The scenario generator places \(n\) demand zones (weights drawn from a skewed Pareto-like distribution) and \(m\) candidate sites on a 100×100 grid using a scenario-specific clustering structure. Drive-times \(d_{ij}\) are Euclidean distances (a proxy; real practice plugs in an OSRM or Google-Maps response). The solver runs greedy-add: start with no stores, at each step open the site whose marginal reduction in \(\sum_j w_j \min_i d_{ij}\) is largest, stop when \(p\) stores are open. Then 1-opt swap: for each open-site / closed-site pair, check if swapping reduces the objective; repeat until no improving swap exists. Complexity \(\mathcal{O}(p \cdot m \cdot n + p (m - p) n)\). Optimality gap vs. MIP is typically under 2% on generated scenarios.

Reading the solution

Three patterns to watch for

  • Store clustering around dense zones. As \(p\) grows, stores go into the highest-weight clusters first, then spread outward. The greedy trajectory is itself useful: it tells you which site is the best first store and which is the best next, informing expansion sequencing.
  • Coverage vs. distance trade-off. p-Median minimises average distance; MCLP maximises covered demand. On highly skewed geographies they can give different answers — always solve both and compare.
  • The Huff post-audit. Once sites are chosen, run Huff with projected store attractiveness and check: how much does the new store cannibalise existing stores? What is the incremental revenue, not the absolute?

Sensitivity questions the model answers

  • What if we can only open 3 stores (capex constraint)? — lower \(p\); marginal sites drop off.
  • What if an online competitor shifts demand from physical? — scale down \(w_j\) and rerun.
  • What if drive-time threshold is 8 min (suburban) instead of 5 min (urban)? — run MCLP with \(D = 8\); covered fraction changes sharply.

Model extensions

Huff gravity + cannibalisation

Replace nearest-store assignment with probabilistic share. Objective becomes nonlinear; solved via greedy + local search on store attractiveness and distance.

Capacitated p-Median

Each store has a capacity \(C_i\). Demand assignment respects capacity; solved via MIP.

Competitive location

Competitor stores are exogenous points; demand split between own and competitor via Huff. Objective: capture share.

Multi-format chain

Open a mix of large and small-format stores (e.g., Express vs. Full). Different cost, capacity, attractiveness per format.

Dynamic network planning

Open/close decisions over a multi-year horizon. Stores “age in” over years; closures trigger severance and brand cost.

Generic facility location

Warehouse / DC siting uses the generic formulation without retail-specific Huff or cannibalisation.

Depot / DC location →
Healthcare facility analogue

Ambulance / clinic placement uses MCLP with life-years as the covered-demand metric.

Ambulance placement →
OR family reference

Dig into exact MIP formulations and Lagrangian relaxation for p-Median.

Location & covering family →

Key references

Hakimi, S. L. (1964).
Optimum locations of switching centers and the absolute centers and medians of a graph.
Operations Research 12(3): 450–459. doi:10.1287/opre.12.3.450 (The p-Median origin.)
Church, R. L. & ReVelle, C. (1974).
The maximal covering location problem.
Papers of the Regional Science Association 32(1): 101–118. (MCLP.)
Huff, D. L. (1964).
Defining and estimating a trading area.
Journal of Marketing 28(3): 34–38. doi:10.1177/002224296402800307
Kariv, O. & Hakimi, S. L. (1979).
An algorithmic approach to network location problems, II: The p-medians.
SIAM Journal on Applied Mathematics 37(3): 539–560. doi:10.1137/0137041
Cornuéjols, G., Fisher, M. L. & Nemhauser, G. L. (1977).
Location of bank accounts to optimize float.
Management Science 23(8): 789–810. doi:10.1287/mnsc.23.8.789
Ghosh, A. & Craig, C. S. (1983).
Formulating retail location strategy in a changing environment.
Journal of Marketing 47(3): 56–68.
Drèze, X. & Ghosh, A. (2005).
Cannibalization and retail productivity.
Journal of Retailing 81(4): 301–318.
Daskin, M. S. (2013, 2nd ed.).
Network and Discrete Location: Models, Algorithms, and Applications.
Wiley.

Back to the retail domain

Retail store location sits in the Place × Strategic cell of the 4P decision matrix — the multi-year decision every downstream operation rests on.

Open Retail Landing
Educational solver · Euclidean distance as drive-time proxy and synthetic scenarios · always validate against real drive-time matrices from a routing engine.