The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes

References

  • Auer P., Ceas-Bianchi N., Freund Y., Schapire R. E. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. on Foundations of Comput. Sci. (1995) (IEEE Computer Society Press, Los Alamitos, CA) 322–331Google Scholar
  • Bertsekas D., Tsitsiklis J.Neuro-Dynamic Programming (1995) (Athena Scientific, Belmont, MA) Google Scholar
  • Blackwell D. An analog of the minimax theorem for vector payoffs. Pacific J. Math. (1956a) 6(1):1–8CrossrefGoogle Scholar
  • Blackwell D. Controlled random walks. Proc. Internat. Congress of Mathematicians, 1954 (1956b) III(North-Holland, Amsterdam, The Netherlands) 336–338Google Scholar
  • Feder M., Merhav N. Universal prediction. IEEE Trans. on Inform. Theory (1998) 44(6):2124–2147CrossrefGoogle Scholar
  • Filar J., Vrieze K.Competitive Markov Decision Processes (1996) (Springer-Verlag, New York) CrossrefGoogle Scholar
  • Flesch J., Thuijsman F., Vrieze O. J. Simplifying optimal strategies in stochastic games. SIAM J. Control Optim. (1998) 36(4):1331–1347CrossrefGoogle Scholar
  • Freund Y., Schapire R. Adaptive game playing using multiplicative weights. Games and Econom. Behavior (1999) 29:79–103CrossrefGoogle Scholar
  • Fudenberg D., Levine D. Universal consistency and cautious fictitious play. J. Econom. Dynamics and Control (1995) 19:1065–1990CrossrefGoogle Scholar
  • Fudenberg D., Levine D.The Theory of Learning in Games (1998) (MIT Press, Cambridge, MA) Google Scholar
  • Hannan J., Dresher M., Tucker A. W., Wolde P. Approximation to Bayes risk in repeated play. Contribution to the Theory of Games, III (1957) (Princeton University Press, Princeton, NJ) 97–139Google Scholar
  • Hart S., Mas-Colell A. A simple adaptive procedure leading to correlated equilibrium. Econometrica (2000) 68:1127–1150CrossrefGoogle Scholar
  • Hart S., Mas-Colell A. A general class of adaptive strategies. J. Econom. Theory (2001) 98:26–54CrossrefGoogle Scholar
  • Kaelbling L., Littman M., Moore A. Reinforcement learning—A survey. J. Artificial Intelligence Res. (1996) 4:237–285Google Scholar
  • Kumar P. R., Varaiya P.Stochastic Systems: Estimation, Identification and Adaptive Control (1986) (Prentice Hall, Englewood Cliffs, NJ) Google Scholar
  • Lapidoth A., Narayan P. Reliable communication under channel uncertainty. IEEE Trans. on Inform. Theory (1998) 44:2148–2177CrossrefGoogle Scholar
  • Lehrer E. Approachability in infinite dimensional spaces and an application: A universal algorithm for generating extended normal numbers. (1998) . Preprint, MayGoogle Scholar
  • Mannor S., Shimkin N. The empirical Bayes envelope approach to regret minimization in stochastic games. (2000a) . Technical report No. EE-1262, Faculty of Electrical Engineering, Technion, Haifa, IsraelGoogle Scholar
  • Mannor S., Shimkin N. Generalized approachability results for stochastic games with a single communicating state. (2000b) . Technical report No. EE-1263, Faculty of Electrical Engineering, Technion, Haifa, IsraelGoogle Scholar
  • Mertens J., Neyman A. Stochastic games. Internat. J. Game Theory (1981) 10(2):53–66CrossrefGoogle Scholar
  • Mertens J., Sorin S., Zamir S. Repeated games. (1994) . CORE Reprint Nos. Discussion Papers 9420, 9421, and 9422. Center for Operations Research and Economics, Universite Catholique de Louvain, Louvain, BelgiumGoogle Scholar
  • Milman E. Uniform properties of stochastic games and approachability. (2000) . Unpublished master's thesis, Tel Aviv University, Tel Aviv, IsraelGoogle Scholar
  • Pated S. Stochastic shortest path games. (1997) . Unpublished Ph.D., Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
  • Puterman M.Markov Decision Processes (1994) (Wiley-Interscience, New York) CrossrefGoogle Scholar
  • Rockafellar R.Convex Analysis (1970) (Princeton University Press, Princeton, NJ) CrossrefGoogle Scholar
  • Rustichini A. Minimizing regret: The general case. Games and Econom. Behavior (1999) 29:224–243CrossrefGoogle Scholar
  • Shimkin N., Shwartz A. Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans. on Automatic Control (1993) 38(1):84–95CrossrefGoogle Scholar
  • Spinat X. An approachability condition for general sets. (1999) . Technical Report No. 496, Ecole Polytechnique, Paris, FranceGoogle Scholar
  • Stoer J., Witzgall C.Convexity and Optimization in Finite Dimensions (1970) I(Springer-Verlag, New York) CrossrefGoogle Scholar
  • Vohra R., Levine D. K., Foster D. Special issue on learning in games. Games and Econom. Behavior (1999) 29(1). see entire issueGoogle Scholar
  • Vovk V. A game of prediction with expert advice. J. Comput. Systems Sci. (1998) 56(2):153–173CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.