Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Published Online:https://doi.org/10.1287/moor.2020.1116

References

  • [1] Arapostathis A, Borkar V, Fernández-Gaucherand E, Ghosh M, Marcus S (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31(2):282–344.CrossrefGoogle Scholar
  • [2] Baier C, Größer M, Bertrand N (2012) Probabilistic ω-automata. J. ACM 59(1):1–52.CrossrefGoogle Scholar
  • [3] Bellman R (1957) A Markovian decision process. J. Math. Mech. 6(5):679–684.Google Scholar
  • [4] Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.CrossrefGoogle Scholar
  • [5] Bonet B, Geffner H (2009) Solving POMDPs: RTDP-bel vs. point-based algorithms. Proc. 21st Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco), 1641–1646.Google Scholar
  • [6] Borkar V (2000) Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM J. Control Optim. 39(3):673–681.CrossrefGoogle Scholar
  • [7] Bukharaev RG (1980) Probabilistic automata. J. Math. Sci. 13:359–386.CrossrefGoogle Scholar
  • [8] Cerný P, Chatterjee K, Henzinger TA, Radhakrishna A, Singh R (2011) Quantitative synthesis for concurrent programs. Gopalakrishnan G, Qadeer S, eds. Proc. Internat. Conf. Comput. Aided Verification, Lecture Notes in Computer Science, vol. 6806 (Springer, Berlin), 243–259.Google Scholar
  • [9] Chatterjee K (2007) Concurrent games with tail objectives. Theoret. Comput. Sci. 388(1–3):181–198.CrossrefGoogle Scholar
  • [10] Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [11] Feinberg E (1996) On measurability and representation of strategic measures in Markov decision processes. Ferguson TS, Shapley LS, MacQueen JB, eds. Statistics, Probability and Game Theory: Papers in Honor of David Blackwell (Institute of Mathematical Statistics), 29–43.CrossrefGoogle Scholar
  • [12] Filar J, Vrieze K (1997) Competitive Markov Decision Processes (Springer, New York).Google Scholar
  • [13] Hansen KA, Ibsen-Jensen R, Neyman A (2018) Absorbing games with a clock and two bits of memory. Working paper, University of Glasgow, Scotland, UK.Google Scholar
  • [14] Hansen KA, Ibsen-Jensen R, Neyman A (2018) The big match with a clock and a bit of memory. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 149–150.Google Scholar
  • [15] Hernández-Lerma O, Lasserre JB (2003) Markov Chains and Invariant Probabilities (Birkhäuser, Basel, Switzerland).Google Scholar
  • [16] Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J. Artificial Intelligence Res. 4:237–285.CrossrefGoogle Scholar
  • [17] Madani O, Hanks S, Condon A (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1–2):5–34.CrossrefGoogle Scholar
  • [18] Neyman A, Sorin S (2010) Repeated games with public uncertain duration process. Internat. J. Game Theory. 39(1–2):29–52.CrossrefGoogle Scholar
  • [19] Paz A (1971) Introduction to Probabilistic Automata. Computer Science and Applied Mathematics (Academic Press, Cambridge, MA).Google Scholar
  • [20] Rabin M (1963) Probabilistic automata. Inform. Control 6(3):230–245.CrossrefGoogle Scholar
  • [21] Renault J (2011) Uniform value in dynamic programming. J. Eur. Math. Soc. 13(2):309–330.CrossrefGoogle Scholar
  • [22] Renault J, Venel X (2016) Long-term values in Markov decision processes and repeated games, and a new distance for probability spaces. Math. Oper. Res. 42(2):349–376.LinkGoogle Scholar
  • [23] Rosenberg D, Solan E, Vieille N (2002) Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30(4):1178–1193.CrossrefGoogle Scholar
  • [24] Shapley L (1953) Stochastic games. Proc. Natl. Acad. Sci. USA. 39(10):1095–1100.CrossrefGoogle Scholar
  • [25] Solan E (2003) Continuity of the value of competitive Markov decision processes. J. Theoret. Probab. 16(4):831–845.CrossrefGoogle Scholar
  • [26] Solan E, Vieille N (2010) Computing uniformly optimal strategies in two-player stochastic games. Econom. Theory 42(1):237–253.CrossrefGoogle Scholar
  • [27] Venel X, Ziliotto B (2016) Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM J. Control Optim. 54(4):1983–2008.CrossrefGoogle Scholar
  • [28] Venel X, Ziliotto B (2021) History-dependent evaluations in POMDPs. SIAM J. Control Optim. Forthcoming.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.