Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Krishnendu Chatterjee
Krishnendu Chatterjee
[email protected]
https://orcid.org/0000-0002-4561-241X
Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
Search for more papers by this author
,
Raimundo Saona
Raimundo Saona
[email protected]
https://orcid.org/0000-0001-5103-038X
Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
Search for more papers by this author
,
Bruno Ziliotto
Corresponding Author
Bruno Ziliotto
[email protected]
http://orcid.org/0000-0002-4448-1411
Centre de Recherche en Mathématiques de la Décision, Centre National de la Recherche Scientifique, Université Paris Dauphine, Université PSL, 75016 Paris, France
Search for more papers by this author

Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria

Search for more papers by this author

Raimundo Saona

[email protected]

https://orcid.org/0000-0001-5103-038X

Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria

Search for more papers by this author

Bruno Ziliotto

Corresponding Author

Bruno Ziliotto

[email protected]

http://orcid.org/0000-0002-4448-1411

Centre de Recherche en Mathématiques de la Décision, Centre National de la Recherche Scientifique, Université Paris Dauphine, Université PSL, 75016 Paris, France

Search for more papers by this author

Published Online:6 Apr 2021https://doi.org/10.1287/moor.2020.1116

References

[1] Arapostathis A, Borkar V, Fernández-Gaucherand E, Ghosh M, Marcus S (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31(2):282–344.Crossref, Google Scholar
[2] Baier C, Größer M, Bertrand N (2012) Probabilistic ω-automata. J. ACM 59(1):1–52.Crossref, Google Scholar
[3] Bellman R (1957) A Markovian decision process. J. Math. Mech. 6(5):679–684.Google Scholar
[4] Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.Crossref, Google Scholar
[5] Bonet B, Geffner H (2009) Solving POMDPs: RTDP-bel vs. point-based algorithms. Proc. 21st Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco), 1641–1646.Google Scholar
[6] Borkar V (2000) Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM J. Control Optim. 39(3):673–681.Crossref, Google Scholar
[7] Bukharaev RG (1980) Probabilistic automata. J. Math. Sci. 13:359–386.Crossref, Google Scholar
[8] Cerný P, Chatterjee K, Henzinger TA, Radhakrishna A, Singh R (2011) Quantitative synthesis for concurrent programs. Gopalakrishnan G, Qadeer S, eds. Proc. Internat. Conf. Comput. Aided Verification, Lecture Notes in Computer Science, vol. 6806 (Springer, Berlin), 243–259.Google Scholar
[9] Chatterjee K (2007) Concurrent games with tail objectives. Theoret. Comput. Sci. 388(1–3):181–198.Crossref, Google Scholar
[10] Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[11] Feinberg E (1996) On measurability and representation of strategic measures in Markov decision processes. Ferguson TS, Shapley LS, MacQueen JB, eds. Statistics, Probability and Game Theory: Papers in Honor of David Blackwell (Institute of Mathematical Statistics), 29–43.Crossref, Google Scholar
[12] Filar J, Vrieze K (1997) Competitive Markov Decision Processes (Springer, New York).Google Scholar
[13] Hansen KA, Ibsen-Jensen R, Neyman A (2018) Absorbing games with a clock and two bits of memory. Working paper, University of Glasgow, Scotland, UK.Google Scholar
[14] Hansen KA, Ibsen-Jensen R, Neyman A (2018) The big match with a clock and a bit of memory. Proc. 2018 ACM Conf. Econom. Comput. (ACM, New York), 149–150.Google Scholar
[15] Hernández-Lerma O, Lasserre JB (2003) Markov Chains and Invariant Probabilities (Birkhäuser, Basel, Switzerland).Google Scholar
[16] Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J. Artificial Intelligence Res. 4:237–285.Crossref, Google Scholar
[17] Madani O, Hanks S, Condon A (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1–2):5–34.Crossref, Google Scholar
[18] Neyman A, Sorin S (2010) Repeated games with public uncertain duration process. Internat. J. Game Theory. 39(1–2):29–52.Crossref, Google Scholar
[19] Paz A (1971) Introduction to Probabilistic Automata. Computer Science and Applied Mathematics (Academic Press, Cambridge, MA).Google Scholar
[20] Rabin M (1963) Probabilistic automata. Inform. Control 6(3):230–245.Crossref, Google Scholar
[21] Renault J (2011) Uniform value in dynamic programming. J. Eur. Math. Soc. 13(2):309–330.Crossref, Google Scholar
[22] Renault J, Venel X (2016) Long-term values in Markov decision processes and repeated games, and a new distance for probability spaces. Math. Oper. Res. 42(2):349–376.Link, Google Scholar
[23] Rosenberg D, Solan E, Vieille N (2002) Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30(4):1178–1193.Crossref, Google Scholar
[24] Shapley L (1953) Stochastic games. Proc. Natl. Acad. Sci. USA. 39(10):1095–1100.Crossref, Google Scholar
[25] Solan E (2003) Continuity of the value of competitive Markov decision processes. J. Theoret. Probab. 16(4):831–845.Crossref, Google Scholar
[26] Solan E, Vieille N (2010) Computing uniformly optimal strategies in two-player stochastic games. Econom. Theory 42(1):237–253.Crossref, Google Scholar
[27] Venel X, Ziliotto B (2016) Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM J. Control Optim. 54(4):1983–2008.Crossref, Google Scholar
[28] Venel X, Ziliotto B (2021) History-dependent evaluations in POMDPs. SIAM J. Control Optim. Forthcoming.Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 1

February 2022

Pages 1-846, C2

Article Information

Metrics

Information

Received:February 14, 2020
Accepted:August 08, 2020
Published Online:April 06, 2021

Cite as

Krishnendu Chatterjee, Raimundo Saona, Bruno Ziliotto (2021) Finite-Memory Strategies in POMDPs with Long-Run Average Objectives. Mathematics of Operations Research 47(1):100-119.

https://doi.org/10.1287/moor.2020.1116

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

References

Volume 47, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News