Decomposable Markov Decision Processes: A Fluid Optimization Approach

Published Online:https://doi.org/10.1287/opre.2016.1531

References

  • Adelman D (2007) Dynamic bid prices in revenue management. Oper. Res. 55(4):647–661.LinkGoogle Scholar
  • Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.LinkGoogle Scholar
  • Anderson EJ, Nash P (1987) Linear Programming in Infinite-Dimensional Spaces (John Wiley & Sons, Chichester, UK).Google Scholar
  • Bellman R (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
  • Bellman R (1961) Adaptive Control Processes: A Guided Tour, Vol. 4 (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Bertsekas DP (1995) Dynamic Programming and Optimal Control, Vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D (1995) The achievable region method in the optimal control of queueing systems; formulations, bounds and policies. Queueing Systems: Theory Appl. 21(3–4):337–389.CrossrefGoogle Scholar
  • Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.LinkGoogle Scholar
  • Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.LinkGoogle Scholar
  • Bertsimas D, Brown DB, Caramanis C (2011) Theory and applications of robust optimization. SIAM Rev. 53(3):464–501.CrossrefGoogle Scholar
  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Machine Learn. 3(1):1–122.CrossrefGoogle Scholar
  • Coffman EG, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper. Res. 28(3):810–821.LinkGoogle Scholar
  • de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.LinkGoogle Scholar
  • de Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29(3):462–478.LinkGoogle Scholar
  • Federgruen A, Groenevelt H (1988) Characterization and optimization of achievable performance in general queueing systems. Oper. Res. 36(5):733–741.LinkGoogle Scholar
  • Ghate A, Smith RL (2013) A linear programming approach to nonstationary infinite-horizon markov decision processes. Oper. Res. 61(2):413–425.LinkGoogle Scholar
  • Goldfarb D, Ma S (2012) Fast multiple-splitting algorithms for convex optimization. SIAM J. Optim. 22(2):533–556.CrossrefGoogle Scholar
  • Hawkins JT (2003) A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • Heyman DP, Sobel MJ (1984) Stochastic Models in Operations Research. Vol. 2, Stochastic Optimization (McGraw-Hill, New York).Google Scholar
  • Howard RA (1971) Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes (Dover, Mineola, NY).Google Scholar
  • Lee I, Epelman MA, Romeijn HE, Smith RL (2013) A linear programming approach to constrained nonstationary infinite-horizon markov decision processes. Technical Report 13-01, Ann Arbor, MI: University of Michigan, Department of Industrial and Operations Engineering.Google Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley-Interscience, Hoboken, NJ).CrossrefGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Dynamic Stochastic Programming (John Wiley & Sons, Chichester, UK).CrossrefGoogle Scholar
  • Romeijn HE, Smith RL, Bean JC (1992) Duality in infinite dimensional linear programming. Math. Programming 53(1–3):79–97.CrossrefGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: polymatroidal structure and optimal scheduling control. Oper. Res. 40(3):S293–S299.LinkGoogle Scholar
  • Van Roy B (2002) Neuro-dynamic programming: Overview and recent trends. Handbook of Markov Decision Processes (Springer, New York), 431–459.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.