Dynamic Defense Reallocation

Markov Decision Process · Approximate Dynamic Programming

Reallocate interceptors in real-time as alien attack waves arrive. Each wave brings new threats while weapons deplete. The optimal policy requires solving a Bellman equation over an exponentially large state space — this demo uses a myopic one-step lookahead heuristic instead.

The Scenario

Wave Defense

Alien attacks arrive in 4 waves, each bringing 2–3 new threats. You command 3 weapons with limited ammunition. After each assignment, engagement outcomes are stochastic — a shot with p=0.8 kill probability still misses 20% of the time. Between waves, you see your remaining ammo and surviving threats, then must decide how to allocate for the next wave. This sequential decision-making under uncertainty is a Markov Decision Process (MDP).

Defense Domain	OR Element	Symbol	Example
Current battle state	State	s = (ammo, threats, time)	(3,2,4, active threats, wave 2)
Assignment this turn	Action	a = {(i,j)}	Laser-1 → Scout-3
Engagement outcomes	Transition	P(s′\|s,a)	Hit with prob 0.8
Threat value destroyed	Reward	R(s,a)	100 points
Future value	Value function	V(s)	Expected total reward
Time horizon	Discount factor	γ = 0.9	Future rewards discounted

Bellman Equation (Bertsekas 2012): V*(s) = max_a∈A(s) { R(s,a) + γ · Σ_s′ P(s′|s,a) · V*(s′) } // State space: |S| = Π(Wᵢ+1) × Π(2) × T // For 3 weapons with 4 ammo each, 8 targets, 4 waves: // |S| = 5³ × 2&sup8; × 4 = 128,000 states // This is SMALL. Real problems have millions. Curse of Dimensionality: // State space grows exponentially with m (weapons) + n (targets) // Exact DP infeasible for m > ~5, n > ~10 Myopic Policy (what this demo implements): a* = argmax_a R(s,a) // maximize IMMEDIATE reward only // Ignores future waves entirely // Fast but suboptimal: may waste ammo on low-value targets

★☆☆ Educational Demo

This is a simplified simulation demonstrating the sequential decision structure of an MDP. The “Myopic AI” uses one-step lookahead only — it does NOT solve the full Bellman equation. A true ADP solver would use value function approximation (e.g., linear basis functions or neural networks) to estimate V*(s), which is far beyond the scope of a browser demo. See Bertsekas (2012) and Powell (2011) for full treatments.

Interactive Simulation

Wave Simulator

★☆☆ Educational Demo

4-Wave Defense Simulation

3 weapons (ammo: 3, 3, 4). Threats arrive in waves. Compare your manual decisions against the myopic AI.

References

Published Bertsekas, D.P. (2012). Dynamic Programming and Optimal Control, Vol. II, 4th ed. Athena Scientific. — Comprehensive treatment of exact and approximate DP; Bellman equation, rollout policies, and value function approximation.

Published Powell, W.B. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. Wiley. — Practical ADP methods for large-scale sequential decision problems.

Preparing for First Contact

If the aliens arrive, we suspect you will not be visiting a GitHub Pages site. We do recommend the Hungarian algorithm. It works on any planet.

All Defense Applications GitHub

👽🛸⚠️

Educational Fiction Disclaimer

This is a fictional educational scenario.

All “alien invasion” content exists purely to teach OR concepts
All data and parameters are entirely fictional
No actual military applications are intended or endorsed
The author advocates for peace and opposes militarization