A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
Published Online:1 Apr 2015https://doi.org/10.1287/moor.2014.0704
References
- (1999) Constrained Markov Decision Processes (Chapman & Hall/CRC, Boca Raton, FL).Google Scholar
- (1993) Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM J. Control Optim. 31:282–344.Crossref, Google Scholar
- (1972) Infinite time reachability of state space regions by using feedback control. IEEE Trans. Automatic Control AC-17:604–613.Crossref, Google Scholar
- (1977) Monotone mappings with application in dynamic programming. SIAM J. Control Optim. 15:438–464.Crossref, Google Scholar
- (2012) Dynamic Programming and Optimal Control, 4th ed., Vol. II (Athena Scientific, Belmont, MA).Google Scholar
- (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
- (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (2010) Distributed asynchronous policy iteration in dynamic programming. Proc. 48th Allerton Conf. Comm., Control Comput. (IEEE, Piscataway, NJ), 1368–1375.Crossref, Google Scholar
- (2012) Q-learning and enhanced policy iteration in discounted dynamic programming. Math. Oper. Res. 37(1):66–94.Link, Google Scholar
- (1964) Memoryless strategies in finite stage dynamic programming. Ann. Math. Statist. 35:863–865.Crossref, Google Scholar
- (1965a) Discounted dynamic programming. Ann. Math. Statist. 36:226–235.Crossref, Google Scholar
- (1965b) Positive dynamic programming. Proc. 5th Berkeley Sympos. Math. Satist. Probab. (University of California Press, Berkeley, CA), 415–418.Google Scholar
- (1968) A Borel set not containing a graph. Ann. Math. Statist. 39:1345–1347.Crossref, Google Scholar
- (1978) Borel-programmable functions. Ann. Probab. 6:321–324.Crossref, Google Scholar
- (1963) Non-existence of everywhere proper conditional distributions. Ann. Math. Statist. 34:223–225.Crossref, Google Scholar
- (1974) The optimal reward operator in dynamic programming. Ann. Probab. 2:926–941.Crossref, Google Scholar
- (1999) Value iteration and optimization of multiclass queueing networks. Queueing Systems 32:65–97.Crossref, Google Scholar
- (2002) Real Analysis and Probability (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1979) Controlled Markov Processes (Springer, New York).Crossref, Google Scholar
- (2002) Total reward criteria. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes (Springer, New York), 155–189.Crossref, Google Scholar
- , eds. (2002) Handbook of Markov Decision Processes (Springer, New York).Crossref, Google Scholar
- (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math. Oper. Res. 37(4):591–607.Link, Google Scholar
- (1974) The optimal reward operator in special classes of dynamic programming problems. Ann. Probability 2:942–949.Crossref, Google Scholar
- (1972) Markovian decision processes with compact action spaces. Ann. Math. Statist. 43:1612–1622.Crossref, Google Scholar
- (1980) A simple proof of Whittle’s bridging condition in dynamic programming. J. Appl. Prob. 17:1114–1116.Crossref, Google Scholar
- (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).Crossref, Google Scholar
- (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).Crossref, Google Scholar
- (1970) Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter (Springer, New York).Crossref, Google Scholar
- (1977) On the optimality of structured policies in countable stage decision processes. II: Positive and negative problems. SIAM J. Appl. Math. 32:457–466.Crossref, Google Scholar
- (1966) Topology I (Academic Press, New York).Google Scholar
- (1968) Discounted dynamic programming on compact metric spaces. Sankhyā: Indian J. Statist., Ser. A 30:211–216.Google Scholar
- (1992) The optimal reward operator in negative dynamic programming. Math. Oper. Res. 17(4):921–931.Link, Google Scholar
- (2008) Control Techniques for Complex Networks (Cambridge University Press, Cambridge, UK).Google Scholar
- (2009) Markov Chains and Stochastic Stability, 2nd ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40:366–370.Crossref, Google Scholar
- (1975) Discrete-Parameter Martingales (North-Holland, Amsterdam).Google Scholar
- (1967) Probability Measures on Metric Spaces (Academic Press, New York).Crossref, Google Scholar
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Crossref, Google Scholar
- (1976) Principles of Mathematical Analysis, 3rd ed. (McGraw-Hill, New York).Google Scholar
- (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheorie verw. Gebiete 32:179–196.Crossref, Google Scholar
- (1978) Probability measures and the C-set of Selivanovskij. Pacific J. Math. 79:189–196.Crossref, Google Scholar
- (1979) Resolution of measurability problems in discrete-time stochastic control. Stochastic Control Theory and Stochastic Differential Systems (Springer, Berlin), 580–587.Crossref, Google Scholar
- (1981) Borel-approachable functions. Fundamenta Mathematicae 112:17–24.Google Scholar
- (1978) Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control. SIAM J. Control Optim. 16:953–977.Crossref, Google Scholar
- (1979) Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1):15–30.Link, Google Scholar
- (1998) A Course on Borel Sets (Springer, New York).Crossref, Google Scholar
- (1966) Negative dynamic programming. Ann. Math. Statist. 37:871–890.Crossref, Google Scholar
- (1998) Reinforcement Learning (MIT Press, Cambridge).Google Scholar
- (1981) Stochastic Dynamic Programming (Mathematical Centre, Amsterdam).Google Scholar
- (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37: 1284–1294.Crossref, Google Scholar
- (1969) On discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40:1635–1660.Crossref, Google Scholar
- (1979) A simple condition for regularity in negative programming. J. Appl. Prob. 16:305–318.Crossref, Google Scholar
- (1980) Stability and characterisation conditions in negative programming. J. Appl. Prob. 17:635–645.Crossref, Google Scholar
- (2014) On convergence of value iteration for a class of total cost Markov decision processes. Unpublished report, http://arxiv.org/abs/1411.1459.Google Scholar
- (2013) Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. 208: 95–132.Crossref, Google Scholar

