The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes

Shie Mannor
Shie Mannor
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Nahum Shimkin
Nahum Shimkin
[email protected]
Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel
Search for more papers by this author

Shie Mannor

[email protected]

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Nahum Shimkin

[email protected]

Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel

Search for more papers by this author

Published Online:1 May 2003https://doi.org/10.1287/moor.28.2.327.14483

References

Auer P., Ceas-Bianchi N., Freund Y., Schapire R. E. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. on Foundations of Comput. Sci. (1995) (IEEE Computer Society Press, Los Alamitos, CA) 322–331Google Scholar
Bertsekas D., Tsitsiklis J.Neuro-Dynamic Programming (1995) (Athena Scientific, Belmont, MA) Google Scholar
Blackwell D. An analog of the minimax theorem for vector payoffs. Pacific J. Math. (1956a) 6(1):1–8Crossref, Google Scholar
Blackwell D. Controlled random walks. Proc. Internat. Congress of Mathematicians, 1954 (1956b) III(North-Holland, Amsterdam, The Netherlands) 336–338Google Scholar
Feder M., Merhav N. Universal prediction. IEEE Trans. on Inform. Theory (1998) 44(6):2124–2147Crossref, Google Scholar
Filar J., Vrieze K.Competitive Markov Decision Processes (1996) (Springer-Verlag, New York) Crossref, Google Scholar
Flesch J., Thuijsman F., Vrieze O. J. Simplifying optimal strategies in stochastic games. SIAM J. Control Optim. (1998) 36(4):1331–1347Crossref, Google Scholar
Freund Y., Schapire R. Adaptive game playing using multiplicative weights. Games and Econom. Behavior (1999) 29:79–103Crossref, Google Scholar
Fudenberg D., Levine D. Universal consistency and cautious fictitious play. J. Econom. Dynamics and Control (1995) 19:1065–1990Crossref, Google Scholar
Fudenberg D., Levine D.The Theory of Learning in Games (1998) (MIT Press, Cambridge, MA) Google Scholar
Hannan J., Dresher M., Tucker A. W., Wolde P. Approximation to Bayes risk in repeated play. Contribution to the Theory of Games, III (1957) (Princeton University Press, Princeton, NJ) 97–139Google Scholar
Hart S., Mas-Colell A. A simple adaptive procedure leading to correlated equilibrium. Econometrica (2000) 68:1127–1150Crossref, Google Scholar
Hart S., Mas-Colell A. A general class of adaptive strategies. J. Econom. Theory (2001) 98:26–54Crossref, Google Scholar
Kaelbling L., Littman M., Moore A. Reinforcement learning—A survey. J. Artificial Intelligence Res. (1996) 4:237–285Google Scholar
Kumar P. R., Varaiya P.Stochastic Systems: Estimation, Identification and Adaptive Control (1986) (Prentice Hall, Englewood Cliffs, NJ) Google Scholar
Lapidoth A., Narayan P. Reliable communication under channel uncertainty. IEEE Trans. on Inform. Theory (1998) 44:2148–2177Crossref, Google Scholar
Lehrer E. Approachability in infinite dimensional spaces and an application: A universal algorithm for generating extended normal numbers. (1998) . Preprint, MayGoogle Scholar
Mannor S., Shimkin N. The empirical Bayes envelope approach to regret minimization in stochastic games. (2000a) . Technical report No. EE-1262, Faculty of Electrical Engineering, Technion, Haifa, IsraelGoogle Scholar
Mannor S., Shimkin N. Generalized approachability results for stochastic games with a single communicating state. (2000b) . Technical report No. EE-1263, Faculty of Electrical Engineering, Technion, Haifa, IsraelGoogle Scholar
Mertens J., Neyman A. Stochastic games. Internat. J. Game Theory (1981) 10(2):53–66Crossref, Google Scholar
Mertens J., Sorin S., Zamir S. Repeated games. (1994) . CORE Reprint Nos. Discussion Papers 9420, 9421, and 9422. Center for Operations Research and Economics, Universite Catholique de Louvain, Louvain, BelgiumGoogle Scholar
Milman E. Uniform properties of stochastic games and approachability. (2000) . Unpublished master's thesis, Tel Aviv University, Tel Aviv, IsraelGoogle Scholar
Pated S. Stochastic shortest path games. (1997) . Unpublished Ph.D., Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
Puterman M.Markov Decision Processes (1994) (Wiley-Interscience, New York) Crossref, Google Scholar
Rockafellar R.Convex Analysis (1970) (Princeton University Press, Princeton, NJ) Crossref, Google Scholar
Rustichini A. Minimizing regret: The general case. Games and Econom. Behavior (1999) 29:224–243Crossref, Google Scholar
Shimkin N., Shwartz A. Guaranteed performance regions in Markovian systems with competing decision makers. IEEE Trans. on Automatic Control (1993) 38(1):84–95Crossref, Google Scholar
Spinat X. An approachability condition for general sets. (1999) . Technical Report No. 496, Ecole Polytechnique, Paris, FranceGoogle Scholar
Stoer J., Witzgall C.Convexity and Optimization in Finite Dimensions (1970) I(Springer-Verlag, New York) Crossref, Google Scholar
Vohra R., Levine D. K., Foster D. Special issue on learning in games. Games and Econom. Behavior (1999) 29(1). see entire issueGoogle Scholar
Vovk V. A game of prediction with expert advice. J. Comput. Systems Sci. (1998) 56(2):153–173Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 28, Issue 2

May 2003

Pages 201-394

Article Information

Metrics

Information

Received:October 09, 2001
Published Online:May 01, 2003

Cite as

Shie Mannor, Nahum Shimkin, (2003) The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes. Mathematics of Operations Research 28(2):327-345.

https://doi.org/10.1287/moor.28.2.327.14483

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes

References

Volume 28, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News