On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

Naci Saldi
Corresponding Author
Naci Saldi
[email protected]
Coordinated Science Laboratory, University of Illinois, Urbana, Illinois 61801, USA
Search for more papers by this author
,
Serdar Yüksel
Serdar Yüksel
[email protected]
Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6
Search for more papers by this author
,
Tamás Linder
Tamás Linder
[email protected]
Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6
Search for more papers by this author

Naci Saldi

Corresponding Author

Naci Saldi

[email protected]

Coordinated Science Laboratory, University of Illinois, Urbana, Illinois 61801, USA

Search for more papers by this author

Serdar Yüksel

[email protected]

Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6

Search for more papers by this author

Tamás Linder

[email protected]

Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6

Search for more papers by this author

Published Online:15 Mar 2017https://doi.org/10.1287/moor.2016.0832

References

Aliprantis CD, Border KC (2006) Infinite Dimensional Analysis (Springer, Berlin).Google Scholar
Bartoszynski R (1961) A characterization of the weak convergence of measures. Ann. Math. Statist. 32(2):561–576.Crossref, Google Scholar
Bertsekas DP (1975) Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20(3):415–419.Crossref, Google Scholar
Bertsekas DP (1995) Dynamic Programming and Optimal Control, Vol. II (Athena Scientific, Belmont, MA).Google Scholar
Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynammic Programming (Athena Scientific, Belmont, MA).Google Scholar
Blackwell D, Freedman D, Orkin M (1974) The optimal reward operator in dynamic programming. Ann. Probab. 2(2):926–941.Crossref, Google Scholar
Borkar V (2002) Convex analytic methods in Markov decision processes. Feinberg E, Shwartz A, eds. Handbook of Markov Decision Processes (Kluwer Academic Publishers, Dordrecht, Netherlands), 347–375.Crossref, Google Scholar
Cavazos-Cadena R (1986) Finite-state approximations for denumerable state discounted Markov decision processes. Appl. Math. Optim. 14(1):1–26.Crossref, Google Scholar
Chang HS, Fu MC, Hu J, Marcus SI (2007) A survey of some simulation-based methods in Markov decision processes. Comm. Inform. System 7(1):59–92.Crossref, Google Scholar
Chow C-S, Tsitsiklis JN (1991) An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans. Automatic Control 36(8):898–914.Crossref, Google Scholar
Cover TM, Thomas JA (2006) Elements of Information Theory, 2nd ed. (John Wiley & Sons, Hoboken, NJ).Google Scholar
Dufour F, Prieto-Rumeau T (2012) Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388(2):1254–1267.Crossref, Google Scholar
Dufour F, Prieto-Rumeau T (2013) Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2):1298–1324.Crossref, Google Scholar
Dufour F, Prieto-Rumeau T (2015) Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87(2):273–307.Crossref, Google Scholar
Feinberg EA, Kasyanov PO, Zadioanchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math. Oper. Res. 37(4):591–607.Link, Google Scholar
Fox BL (1971) Finite-state approximations to denumerable state dynamic programs. J. Math. Anal. Appl. 34(3):665–670.Crossref, Google Scholar
Gordienko E, Hernandez-Lerma O (1995) Average cost Markov control processes with weighted norms: Existence of canonical policies. Appl. Math. 23(2):199–218.Google Scholar
Gray GM, Neuhoff DL (1998) Quantization. IEEE Trans. Inf. Theory 44(6):2325–2383.Crossref, Google Scholar
Hernández-Lerma O (1989) Adaptive Markov Control Processes (Springer, New York).Crossref, Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).Crossref, Google Scholar
Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).Crossref, Google Scholar
Hernández-Lerma O, Lasserre JB (2003) Markov Chains and Invariant Probabilities (Birkhäuser, Basel, Switzerland).Crossref, Google Scholar
Hinderer K (2005) Lipschitz continuity of value functions in Markovian desision processes. Math. Methods Oper. Res. 62(1):3–22.Crossref, Google Scholar
Jain R, Varaiya PP (2006) Simulation-based uniform value function estimates of Markov decision processes. SIAM J. Control Optim. 45(5):1633–1656.Crossref, Google Scholar
Jaśkiewicz A, Nowak AS (2006) On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316(2):495–509.Crossref, Google Scholar
Kuratowski K (1966) Topology, Vol. I (Academic Press, New York).Google Scholar
Langen HJ (1981) Convergence of dynamic programming models. Math. Oper. Res. 6(4):493–512.Link, Google Scholar
Meyn SP, Tweedie RL (1993) Markov Chains and Stochastic Stability (Springer, New York).Crossref, Google Scholar
Ortner R (2007) Pseudometrics for state aggregation in average reward Markov decision processes. Algorithmic Learning Theory (Springer, Berlin), 373–387.Crossref, Google Scholar
Puterman ML (2005) Markov Decision Processes (John Wiley & Sons, Hoboken, NJ).Google Scholar
Ren Z, Krogh BH (2002) State aggregation in Markov decision processes. Proc. 41st IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 3819–3824.Google Scholar
Saldi N, Linder T, Yüksel S (2015) Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Autom. Control 60(2):553–558.Crossref, Google Scholar
Saldi N, Yüksel S, Linder T (2016) Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1):321–337.Crossref, Google Scholar
Serfozo R (1982) Convergence of Lebesgue integrals with varying measures. Sankhya Ser. A 44(3):380–402.Google Scholar
Shreve SE, Bertsekas DP (1979) Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1):15–30.Link, Google Scholar
Van Roy B (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
Vega-Amaya O (2003) The average cost optimality equation: A fixed point approach. Bol. Soc. Mat. Mexicana 9(3):185–195.Google Scholar
Villani C (2009) Optimal Transport: Old and New (Springer, Berlin).Crossref, Google Scholar
White DJ (1980) Finite-state approximations for denumerable state infinite horizon discounted Markov decision processes. J. Math. Anal. Appl. 74(1):292–295.Crossref, Google Scholar
White DJ (1982) Finite-state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards. J. Math. Anal. Appl. 86(1):292–306.Crossref, Google Scholar
Whitt W (1978) Approximations of dynamic programs I. Math. Oper. Res. 3(3):231–243.Link, Google Scholar
Whitt W (1979) Approximations of dynamic programs II. Math. Oper. Res. 4(2):179–185.Link, Google Scholar
Yamada Y, Tazaki S, Gray RM (1980) Asymptotic performance of block quantizers with difference distortion measures. IEEE Trans. Inf. Theory 26(1):6–14.Crossref, Google Scholar
Yu H, Bertsekas DP (2004) Discretized approximations for POMDP with average cost. The 20th Conf. UAI (AUAI Press, Arlington, VA).Google Scholar
Yüksel S, Başar T (2013) Stochastic Networked Control Systems: Stabilization and Optimization Under Information Constraints (Birkhäuser, Boston).Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 42, Issue 4

November 2017

Pages 897-1312

Article Information

Metrics

Information

Received:July 20, 2015
Accepted:October 04, 2016
Published Online:March 15, 2017

Cite as

Naci Saldi, Serdar Yüksel, Tamás Linder (2017) On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces. Mathematics of Operations Research 42(4):945-978.

https://doi.org/10.1287/moor.2016.0832

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

References

Volume 42, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News