A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies

Published Online:https://doi.org/10.1287/moor.2014.0704

References

  • Altman E (1999) Constrained Markov Decision Processes (Chapman & Hall/CRC, Boca Raton, FL).Google Scholar
  • Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM J. Control Optim. 31:282–344.CrossrefGoogle Scholar
  • Bertsekas DP (1972) Infinite time reachability of state space regions by using feedback control. IEEE Trans. Automatic Control AC-17:604–613.CrossrefGoogle Scholar
  • Bertsekas DP (1977) Monotone mappings with application in dynamic programming. SIAM J. Control Optim. 15:438–464.CrossrefGoogle Scholar
  • Bertsekas DP (2012) Dynamic Programming and Optimal Control, 4th ed., Vol. II (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP, Yu H (2010) Distributed asynchronous policy iteration in dynamic programming. Proc. 48th Allerton Conf. Comm., Control Comput. (IEEE, Piscataway, NJ), 1368–1375.CrossrefGoogle Scholar
  • Bertsekas DP, Yu H (2012) Q-learning and enhanced policy iteration in discounted dynamic programming. Math. Oper. Res. 37(1):66–94.LinkGoogle Scholar
  • Blackwell D (1964) Memoryless strategies in finite stage dynamic programming. Ann. Math. Statist. 35:863–865.CrossrefGoogle Scholar
  • Blackwell D (1965a) Discounted dynamic programming. Ann. Math. Statist. 36:226–235.CrossrefGoogle Scholar
  • Blackwell D (1965b) Positive dynamic programming. Proc. 5th Berkeley Sympos. Math. Satist. Probab. (University of California Press, Berkeley, CA), 415–418.Google Scholar
  • Blackwell D (1968) A Borel set not containing a graph. Ann. Math. Statist. 39:1345–1347.CrossrefGoogle Scholar
  • Blackwell D (1978) Borel-programmable functions. Ann. Probab. 6:321–324.CrossrefGoogle Scholar
  • Blackwell D, Ryll-Nardzewski C (1963) Non-existence of everywhere proper conditional distributions. Ann. Math. Statist. 34:223–225.CrossrefGoogle Scholar
  • Blackwell D, Freedman D, Orkin M (1974) The optimal reward operator in dynamic programming. Ann. Probab. 2:926–941.CrossrefGoogle Scholar
  • Chen RR, Meyn S (1999) Value iteration and optimization of multiclass queueing networks. Queueing Systems 32:65–97.CrossrefGoogle Scholar
  • Dudley RM (2002) Real Analysis and Probability (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Dynkin EB, Yushkevich AA (1979) Controlled Markov Processes (Springer, New York).CrossrefGoogle Scholar
  • Feinberg EA (2002) Total reward criteria. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes (Springer, New York), 155–189.CrossrefGoogle Scholar
  • Feinberg EA, Shwartz A, eds. (2002) Handbook of Markov Decision Processes (Springer, New York).CrossrefGoogle Scholar
  • Feinberg EA, Kasyanov PO, Zadoianchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math. Oper. Res. 37(4):591–607.LinkGoogle Scholar
  • Freedman D (1974) The optimal reward operator in special classes of dynamic programming problems. Ann. Probability 2:942–949.CrossrefGoogle Scholar
  • Furukawa N (1972) Markovian decision processes with compact action spaces. Ann. Math. Statist. 43:1612–1622.CrossrefGoogle Scholar
  • Hartley R (1980) A simple proof of Whittle’s bridging condition in dynamic programming. J. Appl. Prob. 17:1114–1116.CrossrefGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).CrossrefGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).CrossrefGoogle Scholar
  • Hinderer K (1970) Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter (Springer, New York).CrossrefGoogle Scholar
  • Kreps DM, Porteus EL (1977) On the optimality of structured policies in countable stage decision processes. II: Positive and negative problems. SIAM J. Appl. Math. 32:457–466.CrossrefGoogle Scholar
  • Kuratowski K (1966) Topology I (Academic Press, New York).Google Scholar
  • Maitra A (1968) Discounted dynamic programming on compact metric spaces. Sankhyā: Indian J. Statist., Ser. A 30:211–216.Google Scholar
  • Maitra A, Sudderth W (1992) The optimal reward operator in negative dynamic programming. Math. Oper. Res. 17(4):921–931.LinkGoogle Scholar
  • Meyn S (2008) Control Techniques for Complex Networks (Cambridge University Press, Cambridge, UK).Google Scholar
  • Meyn S, Tweedie RL (2009) Markov Chains and Stochastic Stability, 2nd ed. (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Miller BL, Veinott AF (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40:366–370.CrossrefGoogle Scholar
  • Neveu J (1975) Discrete-Parameter Martingales (North-Holland, Amsterdam).Google Scholar
  • Parthasarathy KR (1967) Probability Measures on Metric Spaces (Academic Press, New York).CrossrefGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Rudin W (1976) Principles of Mathematical Analysis, 3rd ed. (McGraw-Hill, New York).Google Scholar
  • Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie verw. Gebiete 32:179–196.CrossrefGoogle Scholar
  • Shreve SE (1978) Probability measures and the C-set of Selivanovskij. Pacific J. Math. 79:189–196.CrossrefGoogle Scholar
  • Shreve SE (1979) Resolution of measurability problems in discrete-time stochastic control. Stochastic Control Theory and Stochastic Differential Systems (Springer, Berlin), 580–587.CrossrefGoogle Scholar
  • Shreve SE (1981) Borel-approachable functions. Fundamenta Mathematicae 112:17–24.Google Scholar
  • Shreve SE, Bertsekas DP (1978) Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control. SIAM J. Control Optim. 16:953–977.CrossrefGoogle Scholar
  • Shreve SE, Bertsekas DP (1979) Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1):15–30.LinkGoogle Scholar
  • Srivastava SM (1998) A Course on Borel Sets (Springer, New York).CrossrefGoogle Scholar
  • Strauch RE (1966) Negative dynamic programming. Ann. Math. Statist. 37:871–890.CrossrefGoogle Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning (MIT Press, Cambridge).Google Scholar
  • van der Wal J (1981) Stochastic Dynamic Programming (Mathematical Centre, Amsterdam).Google Scholar
  • Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37: 1284–1294.CrossrefGoogle Scholar
  • Veinott AF (1969) On discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40:1635–1660.CrossrefGoogle Scholar
  • Whittle P (1979) A simple condition for regularity in negative programming. J. Appl. Prob. 16:305–318.CrossrefGoogle Scholar
  • Whittle P (1980) Stability and characterisation conditions in negative programming. J. Appl. Prob. 17:635–645.CrossrefGoogle Scholar
  • Yu H (2014) On convergence of value iteration for a class of total cost Markov decision processes. Unpublished report, http://arxiv.org/abs/1411.1459.Google Scholar
  • Yu H, Bertsekas DP (2013) Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. 208: 95–132.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.