Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

Jérôme Renault
Jérôme Renault
[email protected]
Toulouse School of Economics, Université Toulouse 1 Capitole, Toulouse, France
Search for more papers by this author
,
Xavier Venel
Corresponding Author
Xavier Venel
[email protected]
Paris School of Economics; and Université Paris 1 Panthéon-Sorbonne, Paris, France
Search for more papers by this author

Jérôme Renault

[email protected]

Toulouse School of Economics, Université Toulouse 1 Capitole, Toulouse, France

Search for more papers by this author

Xavier Venel

Corresponding Author

Xavier Venel

[email protected]

Paris School of Economics; and Université Paris 1 Panthéon-Sorbonne, Paris, France

Search for more papers by this author

Published Online:28 Nov 2016https://doi.org/10.1287/moor.2016.0814

References

Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM J. Control Optim. 31(2):282–344.Crossref, Google Scholar
Ash RB (1972) Real Analysis and Probability (Academic Press, New York).Google Scholar
Åström KJ (1965) Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1):174–205.Crossref, Google Scholar
Aubin JP (1977) Applied Abstract Analysis (John Wiley & Sons, New York).Google Scholar
Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
Bellman R (1957) A Markovian decision process. Technical Report P-1066, RAND Corporation, Santa Monica, CA.Google Scholar
Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Math. Oper. Res. 1(3):197–208.Link, Google Scholar
Birkhoff GD (1931) Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA 17(12):656–660.Crossref, Google Scholar
Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.Crossref, Google Scholar
Borkar VS (2000) Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM J. Control Optim. 39(3):673–681.Crossref, Google Scholar
Borkar VS (2007) Dynamic programming for ergodic control of Markov chains under partial observations: A correction. SIAM J. Control Optim. 45(6):2299–2304.Crossref, Google Scholar
Bressaud X, Quas A (2014) Asymmetric warfare. Preprint arXiv:1403.1385.Google Scholar
Buckdahn R, Goreac D, Quincampoix M (2014) Existence of asymptotic values for nonexpansive stochastic control systems. Appl. Math. Optim. 70(1):1–28.Crossref, Google Scholar
Choquet G (1956) Existence et unicité des représentations intégrales au moyen des points extrémaux dans les cônes convexes. Séminaire Bourbaki 4:33–47.Google Scholar
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16(3):468–487.Crossref, Google Scholar
Dubins LE, Savage LJ (1965) How to Gamble If You Must: Inequalities for Stochastic Processes (McGraw-Hill, New York).Google Scholar
Dudley RM (2002) Real Analysis and Probability, Cambridge Studies in Advanced Mathematics, Vol. 74 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Harsanyi JC (1967) Games with incomplete information played by “Bayesian” players, I–III: Part I. The basic model. Management Sci. 14(3):159–182.Link, Google Scholar
Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352–362.Link, Google Scholar
Hörner J, Rosenberg D, Solan E, Vieille N (2010) On a Markov game with one-sided information. Oper. Res. 58(4):1107–1115.Link, Google Scholar
Lehrer E, Sorin S (1992) A uniform Tauberian theorem in dynamic programming. Math. Oper. Res. 17(2):303–307.Link, Google Scholar
Maitra A, Sudderth W (1996) Discrete Gambling and Stochastic Games (Springer, Berlin).Crossref, Google Scholar
McShane EJ (1934) Extension of range of functions. Bull. Amer. Math. Soc. 40(12):837–842.Crossref, Google Scholar
Mertens J-F (1987) Repeated games. Proc. Internat. Congress Mathematicians, Berkeley, California, USA, 1986 (American Mathematical Society, Providence, RI), 1528–1577.Crossref, Google Scholar
Mertens J-F, Neyman A (1981) Stochastic games. Internat. J. Game Theory 10(2):53–66.Crossref, Google Scholar
Mertens J-F, Zamir S (1985) Formulation of Bayesian analysis for games with incomplete information. Internat. J. Game Theory 14(1):1–29.Crossref, Google Scholar
Mertens J-F, Sorin S, Zamir S (2005) Repeated games (Cambridge University Press, Cambridge, UK).Google Scholar
Neyman A (2008) Existence of optimal strategies in Markov games with incomplete information. Internat. J. Game Theory 37(4):581–596.Crossref, Google Scholar
Quincampoix M, Renault J (2011) On the existence of a limit value in some nonexpansive optimal control problems. SIAM J. Control Optim. 49(5):2118–2132.Crossref, Google Scholar
Renault J (2006) The value of Markov chain games with lack of information on one side. Math. Oper. Res. 31(3):490–512.Link, Google Scholar
Renault J (2011) Uniform value in dynamic programming. J. Eur. Math. Soc. 13(2):309–330.Crossref, Google Scholar
Renault J (2014) General limit value in dynamic programming. J. Dynam. Games 1(3):471–484.Crossref, Google Scholar
Renault J (2012) The value of repeated games with an informed controller. Math. Oper. Res. 37(1):154–179.Link, Google Scholar
Rhenius D (1974) Incomplete information in Markovian decision models. Ann. Statist. 2(6):1327–1334.Crossref, Google Scholar
Rosenberg D, Solan E, Vieille N (2002) Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30(4):1178–1193.Crossref, Google Scholar
Rosenberg D, Solan E, Vieille N (2004) Stochastic games with a single controller and incomplete information. SIAM J. Control Optim. 43(1):86–110.Crossref, Google Scholar
Runggaldier WJ, Stettner Ł (1991) On the construction of nearly optimal strategies for a general problem of control of partially observed diffusions. Stochastics 37(1–2):15–47.Google Scholar
Santambrogio F (2012) Personal communication, September 2012.Google Scholar
Sawaragi Y, Yoshikawa T (1970) Discrete-time Markovian decision processes with incomplete state observation. Ann. Math. Statist. 41(1):78–86.Crossref, Google Scholar
Shapley LS (1953) Stochastic games. Proc. Natl. Acad. Sci. USA 39(10):1095–1100.Crossref, Google Scholar
Sorin S (2002) A First Course on Zero-Sum Repeated Games, Mathematiques and Applications, Vol. 37 (Springer, New York).Google Scholar
Villani C (2003) Topics in Optimal Transportation, Graduate Studies in Mathematics, Vol. 58 (American Mathematical Society, Providence, RI).Crossref, Google Scholar
von Neumann J (1932) Proof of the quasi-ergodic hypothesis. Proc. Natl. Acad. Sci. USA 18(1):70–82.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 42, Issue 2

May 2017

Pages 277-575

Article Information

Metrics

Information

Received:February 25, 2015
Published Online:November 28, 2016

Cite as

Jérôme Renault, Xavier Venel (2016) Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces. Mathematics of Operations Research 42(2):349-376.

https://doi.org/10.1287/moor.2016.0814

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

References

Volume 42, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News