Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
Published Online:6 Apr 2021https://doi.org/10.1287/moor.2020.1116
References
- [1] (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31(2):282–344.Crossref, Google Scholar
- [2] (2012) Probabilistic ω-automata. J. ACM 59(1):1–52.Crossref, Google Scholar
- [3] (1957) A Markovian decision process. J. Math. Mech. 6(5):679–684.Google Scholar
- [4] (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.Crossref, Google Scholar
- [5] (2009) Solving POMDPs: RTDP-bel vs. point-based algorithms. Proc. 21st Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco), 1641–1646.Google Scholar
- [6] (2000) Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM J. Control Optim. 39(3):673–681.Crossref, Google Scholar
- [7] (1980) Probabilistic automata. J. Math. Sci. 13:359–386.Crossref, Google Scholar
- [8] (2011) Quantitative synthesis for concurrent programs. Gopalakrishnan G, Qadeer S, eds. Proc. Internat. Conf. Comput. Aided Verification, Lecture Notes in Computer Science, vol. 6806 (Springer, Berlin), 243–259.Google Scholar
- [9] (2007) Concurrent games with tail objectives. Theoret. Comput. Sci. 388(1–3):181–198.Crossref, Google Scholar
- [10] (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [11] (1996) On measurability and representation of strategic measures in Markov decision processes. Ferguson TS, Shapley LS, MacQueen JB, eds. Statistics, Probability and Game Theory: Papers in Honor of David Blackwell (Institute of Mathematical Statistics), 29–43.Crossref, Google Scholar
- [12] (1997) Competitive Markov Decision Processes (Springer, New York).Google Scholar
- [13] A (2018) Absorbing games with a clock and two bits of memory. Working paper, University of Glasgow, Scotland, UK.Google Scholar
- [14] (2018) The big match with a clock and a bit of memory. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 149–150.Google Scholar
- [15] (2003) Markov Chains and Invariant Probabilities (Birkhäuser, Basel, Switzerland).Google Scholar
- [16] (1996) Reinforcement learning: A survey. J. Artificial Intelligence Res. 4:237–285.Crossref, Google Scholar
- [17] (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1–2):5–34.Crossref, Google Scholar
- [18] (2010) Repeated games with public uncertain duration process. Internat. J. Game Theory. 39(1–2):29–52.Crossref, Google Scholar
- [19] (1971) Introduction to Probabilistic Automata. Computer Science and Applied Mathematics (Academic Press, Cambridge, MA).Google Scholar
- [20] (1963) Probabilistic automata. Inform. Control 6(3):230–245.Crossref, Google Scholar
- [21] (2011) Uniform value in dynamic programming. J. Eur. Math. Soc. 13(2):309–330.Crossref, Google Scholar
- [22] (2016) Long-term values in Markov decision processes and repeated games, and a new distance for probability spaces. Math. Oper. Res. 42(2):349–376.Link, Google Scholar
- [23] (2002) Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30(4):1178–1193.Crossref, Google Scholar
- [24] (1953) Stochastic games. Proc. Natl. Acad. Sci. USA. 39(10):1095–1100.Crossref, Google Scholar
- [25] (2003) Continuity of the value of competitive Markov decision processes. J. Theoret. Probab. 16(4):831–845.Crossref, Google Scholar
- [26] (2010) Computing uniformly optimal strategies in two-player stochastic games. Econom. Theory 42(1):237–253.Crossref, Google Scholar
- [27] (2016) Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM J. Control Optim. 54(4):1983–2008.Crossref, Google Scholar
- [28] (2021) History-dependent evaluations in POMDPs. SIAM J. Control Optim. Forthcoming.Google Scholar

