Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

Published Online:https://doi.org/10.1287/moor.2016.0814

References

  • Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM J. Control Optim. 31(2):282–344.CrossrefGoogle Scholar
  • Ash RB (1972) Real Analysis and Probability (Academic Press, New York).Google Scholar
  • Åström KJ (1965) Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1):174–205.CrossrefGoogle Scholar
  • Aubin JP (1977) Applied Abstract Analysis (John Wiley & Sons, New York).Google Scholar
  • Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
  • Bellman R (1957) A Markovian decision process. Technical Report P-1066, RAND Corporation, Santa Monica, CA.Google Scholar
  • Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Math. Oper. Res. 1(3):197–208.LinkGoogle Scholar
  • Birkhoff GD (1931) Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA 17(12):656–660.CrossrefGoogle Scholar
  • Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.CrossrefGoogle Scholar
  • Borkar VS (2000) Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM J. Control Optim. 39(3):673–681.CrossrefGoogle Scholar
  • Borkar VS (2007) Dynamic programming for ergodic control of Markov chains under partial observations: A correction. SIAM J. Control Optim. 45(6):2299–2304.CrossrefGoogle Scholar
  • Bressaud X, Quas A (2014) Asymmetric warfare. Preprint arXiv:1403.1385.Google Scholar
  • Buckdahn R, Goreac D, Quincampoix M (2014) Existence of asymptotic values for nonexpansive stochastic control systems. Appl. Math. Optim. 70(1):1–28.CrossrefGoogle Scholar
  • Choquet G (1956) Existence et unicité des représentations intégrales au moyen des points extrémaux dans les cônes convexes. Séminaire Bourbaki 4:33–47.Google Scholar
  • Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16(3):468–487.CrossrefGoogle Scholar
  • Dubins LE, Savage LJ (1965) How to Gamble If You Must: Inequalities for Stochastic Processes (McGraw-Hill, New York).Google Scholar
  • Dudley RM (2002) Real Analysis and Probability, Cambridge Studies in Advanced Mathematics, Vol. 74 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Harsanyi JC (1967) Games with incomplete information played by “Bayesian” players, I–III: Part I. The basic model. Management Sci. 14(3):159–182.LinkGoogle Scholar
  • Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352–362.LinkGoogle Scholar
  • Hörner J, Rosenberg D, Solan E, Vieille N (2010) On a Markov game with one-sided information. Oper. Res. 58(4):1107–1115.LinkGoogle Scholar
  • Lehrer E, Sorin S (1992) A uniform Tauberian theorem in dynamic programming. Math. Oper. Res. 17(2):303–307.LinkGoogle Scholar
  • Maitra A, Sudderth W (1996) Discrete Gambling and Stochastic Games (Springer, Berlin).CrossrefGoogle Scholar
  • McShane EJ (1934) Extension of range of functions. Bull. Amer. Math. Soc. 40(12):837–842.CrossrefGoogle Scholar
  • Mertens J-F (1987) Repeated games. Proc. Internat. Congress Mathematicians, Berkeley, California, USA, 1986 (American Mathematical Society, Providence, RI), 1528–1577.CrossrefGoogle Scholar
  • Mertens J-F, Neyman A (1981) Stochastic games. Internat. J. Game Theory 10(2):53–66.CrossrefGoogle Scholar
  • Mertens J-F, Zamir S (1985) Formulation of Bayesian analysis for games with incomplete information. Internat. J. Game Theory 14(1):1–29.CrossrefGoogle Scholar
  • Mertens J-F, Sorin S, Zamir S (2005) Repeated games (Cambridge University Press, Cambridge, UK).Google Scholar
  • Neyman A (2008) Existence of optimal strategies in Markov games with incomplete information. Internat. J. Game Theory 37(4):581–596.CrossrefGoogle Scholar
  • Quincampoix M, Renault J (2011) On the existence of a limit value in some nonexpansive optimal control problems. SIAM J. Control Optim. 49(5):2118–2132.CrossrefGoogle Scholar
  • Renault J (2006) The value of Markov chain games with lack of information on one side. Math. Oper. Res. 31(3):490–512.LinkGoogle Scholar
  • Renault J (2011) Uniform value in dynamic programming. J. Eur. Math. Soc. 13(2):309–330.CrossrefGoogle Scholar
  • Renault J (2014) General limit value in dynamic programming. J. Dynam. Games 1(3):471–484.CrossrefGoogle Scholar
  • Renault J (2012) The value of repeated games with an informed controller. Math. Oper. Res. 37(1):154–179.LinkGoogle Scholar
  • Rhenius D (1974) Incomplete information in Markovian decision models. Ann. Statist. 2(6):1327–1334.CrossrefGoogle Scholar
  • Rosenberg D, Solan E, Vieille N (2002) Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30(4):1178–1193.CrossrefGoogle Scholar
  • Rosenberg D, Solan E, Vieille N (2004) Stochastic games with a single controller and incomplete information. SIAM J. Control Optim. 43(1):86–110.CrossrefGoogle Scholar
  • Runggaldier WJ, Stettner Ł (1991) On the construction of nearly optimal strategies for a general problem of control of partially observed diffusions. Stochastics 37(1–2):15–47.Google Scholar
  • Santambrogio F (2012) Personal communication, September 2012.Google Scholar
  • Sawaragi Y, Yoshikawa T (1970) Discrete-time Markovian decision processes with incomplete state observation. Ann. Math. Statist. 41(1):78–86.CrossrefGoogle Scholar
  • Shapley LS (1953) Stochastic games. Proc. Natl. Acad. Sci. USA 39(10):1095–1100.CrossrefGoogle Scholar
  • Sorin S (2002) A First Course on Zero-Sum Repeated Games, Mathematiques and Applications, Vol. 37 (Springer, New York).Google Scholar
  • Villani C (2003) Topics in Optimal Transportation, Graduate Studies in Mathematics, Vol. 58 (American Mathematical Society, Providence, RI).CrossrefGoogle Scholar
  • von Neumann J (1932) Proof of the quasi-ergodic hypothesis. Proc. Natl. Acad. Sci. USA 18(1):70–82.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.