Skip to main content

Dynamic Defense Reallocation

Markov Decision Process · Approximate Dynamic Programming

Reallocate interceptors in real-time as alien attack waves arrive. Each wave brings new threats while weapons deplete. The optimal policy requires solving a Bellman equation over an exponentially large state space — this demo uses a myopic one-step lookahead heuristic instead.

Wave Defense

Alien attacks arrive in 4 waves, each bringing 2–3 new threats. You command 3 weapons with limited ammunition. After each assignment, engagement outcomes are stochastic — a shot with p=0.8 kill probability still misses 20% of the time. Between waves, you see your remaining ammo and surviving threats, then must decide how to allocate for the next wave. This sequential decision-making under uncertainty is a Markov Decision Process (MDP).
Defense DomainOR ElementSymbolExample
Current battle stateStates = (ammo, threats, time)(3,2,4, active threats, wave 2)
Assignment this turnActiona = {(i,j)}Laser-1 → Scout-3
Engagement outcomesTransitionP(s′|s,a)Hit with prob 0.8
Threat value destroyedRewardR(s,a)100 points
Future valueValue functionV(s)Expected total reward
Time horizonDiscount factorγ = 0.9Future rewards discounted
Bellman Equation (Bertsekas 2012): V*(s) = maxa∈A(s) { R(s,a) + γ · Σs′ P(s′|s,a) · V*(s′) } // State space: |S| = Π(Wᵢ+1) × Π(2) × T // For 3 weapons with 4 ammo each, 8 targets, 4 waves: // |S| = 5³ × 2&sup8; × 4 = 128,000 states // This is SMALL. Real problems have millions. Curse of Dimensionality: // State space grows exponentially with m (weapons) + n (targets) // Exact DP infeasible for m > ~5, n > ~10 Myopic Policy (what this demo implements): a* = argmaxa R(s,a) // maximize IMMEDIATE reward only // Ignores future waves entirely // Fast but suboptimal: may waste ammo on low-value targets

★☆☆ Educational Demo

This is a simplified simulation demonstrating the sequential decision structure of an MDP. The “Myopic AI” uses one-step lookahead only — it does NOT solve the full Bellman equation. A true ADP solver would use value function approximation (e.g., linear basis functions or neural networks) to estimate V*(s), which is far beyond the scope of a browser demo. See Bertsekas (2012) and Powell (2011) for full treatments.

Wave Simulator

★☆☆ Educational Demo
4-Wave Defense Simulation

3 weapons (ammo: 3, 3, 4). Threats arrive in waves. Compare your manual decisions against the myopic AI.

References
Published Bertsekas, D.P. (2012). Dynamic Programming and Optimal Control, Vol. II, 4th ed. Athena Scientific. — Comprehensive treatment of exact and approximate DP; Bellman equation, rollout policies, and value function approximation.
Published Powell, W.B. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. Wiley. — Practical ADP methods for large-scale sequential decision problems.

Preparing for First Contact

If the aliens arrive, we suspect you will not be visiting a GitHub Pages site. We do recommend the Hungarian algorithm. It works on any planet.

👽🛸⚠️

Educational Fiction Disclaimer

This is a fictional educational scenario.

  • All “alien invasion” content exists purely to teach OR concepts
  • All data and parameters are entirely fictional
  • No actual military applications are intended or endorsed
  • The author advocates for peace and opposes militarization