On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

Published Online:https://doi.org/10.1287/moor.2016.0832

References

  • Aliprantis CD, Border KC (2006) Infinite Dimensional Analysis (Springer, Berlin).Google Scholar
  • Bartoszynski R (1961) A characterization of the weak convergence of measures. Ann. Math. Statist. 32(2):561–576.CrossrefGoogle Scholar
  • Bertsekas DP (1975) Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20(3):415–419.CrossrefGoogle Scholar
  • Bertsekas DP (1995) Dynamic Programming and Optimal Control, Vol. II (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynammic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Blackwell D, Freedman D, Orkin M (1974) The optimal reward operator in dynamic programming. Ann. Probab. 2(2):926–941.CrossrefGoogle Scholar
  • Borkar V (2002) Convex analytic methods in Markov decision processes. Feinberg E, Shwartz A, eds. Handbook of Markov Decision Processes (Kluwer Academic Publishers, Dordrecht, Netherlands), 347–375.CrossrefGoogle Scholar
  • Cavazos-Cadena R (1986) Finite-state approximations for denumerable state discounted Markov decision processes. Appl. Math. Optim. 14(1):1–26.CrossrefGoogle Scholar
  • Chang HS, Fu MC, Hu J, Marcus SI (2007) A survey of some simulation-based methods in Markov decision processes. Comm. Inform. System 7(1):59–92.CrossrefGoogle Scholar
  • Chow C-S, Tsitsiklis JN (1991) An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans. Automatic Control 36(8):898–914.CrossrefGoogle Scholar
  • Cover TM, Thomas JA (2006) Elements of Information Theory, 2nd ed. (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Dufour F, Prieto-Rumeau T (2012) Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388(2):1254–1267.CrossrefGoogle Scholar
  • Dufour F, Prieto-Rumeau T (2013) Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2):1298–1324.CrossrefGoogle Scholar
  • Dufour F, Prieto-Rumeau T (2015) Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87(2):273–307.CrossrefGoogle Scholar
  • Feinberg EA, Kasyanov PO, Zadioanchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math. Oper. Res. 37(4):591–607.LinkGoogle Scholar
  • Fox BL (1971) Finite-state approximations to denumerable state dynamic programs. J. Math. Anal. Appl. 34(3):665–670.CrossrefGoogle Scholar
  • Gordienko E, Hernandez-Lerma O (1995) Average cost Markov control processes with weighted norms: Existence of canonical policies. Appl. Math. 23(2):199–218.Google Scholar
  • Gray GM, Neuhoff DL (1998) Quantization. IEEE Trans. Inf. Theory 44(6):2325–2383.CrossrefGoogle Scholar
  • Hernández-Lerma O (1989) Adaptive Markov Control Processes (Springer, New York).CrossrefGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).CrossrefGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).CrossrefGoogle Scholar
  • Hernández-Lerma O, Lasserre JB (2003) Markov Chains and Invariant Probabilities (Birkhäuser, Basel, Switzerland).CrossrefGoogle Scholar
  • Hinderer K (2005) Lipschitz continuity of value functions in Markovian desision processes. Math. Methods Oper. Res. 62(1):3–22.CrossrefGoogle Scholar
  • Jain R, Varaiya PP (2006) Simulation-based uniform value function estimates of Markov decision processes. SIAM J. Control Optim. 45(5):1633–1656.CrossrefGoogle Scholar
  • Jaśkiewicz A, Nowak AS (2006) On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316(2):495–509.CrossrefGoogle Scholar
  • Kuratowski K (1966) Topology, Vol. I (Academic Press, New York).Google Scholar
  • Langen HJ (1981) Convergence of dynamic programming models. Math. Oper. Res. 6(4):493–512.LinkGoogle Scholar
  • Meyn SP, Tweedie RL (1993) Markov Chains and Stochastic Stability (Springer, New York).CrossrefGoogle Scholar
  • Ortner R (2007) Pseudometrics for state aggregation in average reward Markov decision processes. Algorithmic Learning Theory (Springer, Berlin), 373–387.CrossrefGoogle Scholar
  • Puterman ML (2005) Markov Decision Processes (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Ren Z, Krogh BH (2002) State aggregation in Markov decision processes. Proc. 41st IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 3819–3824.Google Scholar
  • Saldi N, Linder T, Yüksel S (2015) Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Autom. Control 60(2):553–558.CrossrefGoogle Scholar
  • Saldi N, Yüksel S, Linder T (2016) Near optimality of quantized policies in stochastic control under weak continuity conditions. J. Math. Anal. Appl. 435(1):321–337.CrossrefGoogle Scholar
  • Serfozo R (1982) Convergence of Lebesgue integrals with varying measures. Sankhya Ser. A 44(3):380–402.Google Scholar
  • Shreve SE, Bertsekas DP (1979) Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1):15–30.LinkGoogle Scholar
  • Van Roy B (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.LinkGoogle Scholar
  • Vega-Amaya O (2003) The average cost optimality equation: A fixed point approach. Bol. Soc. Mat. Mexicana 9(3):185–195.Google Scholar
  • Villani C (2009) Optimal Transport: Old and New (Springer, Berlin).CrossrefGoogle Scholar
  • White DJ (1980) Finite-state approximations for denumerable state infinite horizon discounted Markov decision processes. J. Math. Anal. Appl. 74(1):292–295.CrossrefGoogle Scholar
  • White DJ (1982) Finite-state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards. J. Math. Anal. Appl. 86(1):292–306.CrossrefGoogle Scholar
  • Whitt W (1978) Approximations of dynamic programs I. Math. Oper. Res. 3(3):231–243.LinkGoogle Scholar
  • Whitt W (1979) Approximations of dynamic programs II. Math. Oper. Res. 4(2):179–185.LinkGoogle Scholar
  • Yamada Y, Tazaki S, Gray RM (1980) Asymptotic performance of block quantizers with difference distortion measures. IEEE Trans. Inf. Theory 26(1):6–14.CrossrefGoogle Scholar
  • Yu H, Bertsekas DP (2004) Discretized approximations for POMDP with average cost. The 20th Conf. UAI (AUAI Press, Arlington, VA).Google Scholar
  • Yüksel S, Başar T (2013) Stochastic Networked Control Systems: Stabilization and Optimization Under Information Constraints (Birkhäuser, Boston).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.