Online Markov Decision Processes

Eyal Even-Dar
Eyal Even-Dar
[email protected]
Google Research, New York, New York 10011
Search for more papers by this author
,
Sham. M. Kakade
Sham. M. Kakade
[email protected]
Toyota Technological Institute, Chicago, Illinois 60637
Search for more papers by this author
,
Yishay Mansour
Yishay Mansour
[email protected]
School of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel
Search for more papers by this author

Eyal Even-Dar

[email protected]

Google Research, New York, New York 10011

Search for more papers by this author

Sham. M. Kakade

[email protected]

Toyota Technological Institute, Chicago, Illinois 60637

Search for more papers by this author

Yishay Mansour

[email protected]

School of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel

Search for more papers by this author

Published Online:22 Jul 2009https://doi.org/10.1287/moor.1090.0396

References

Auer P., Cesa-Bianchi N., Gentile C. Adaptive and self-confident on-line learning algorithms. J. Comput. System Sci. (2002) 64:48–75Crossref, Google Scholar
Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
Blum A., Kalai A. Universal portfolios with and without transaction costs. Machine Learning (1999) 35:193–205Crossref, Google Scholar
Borodin A., El-Yaniv R.Online Computation and Competitive Analysis (1998) (Cambridge University Press, Cambridge, UK) Google Scholar
Cesa-Bianchi N., Freund Y., Helmbold D. P., Haussler D., Schapire R. E., Warmuth M. K. How to use expert advice. J. ACM (1997) 44(3):427–485Crossref, Google Scholar
de Farias D. P., Megiddo N. Combining expert advice in reactive environments. J. ACM (2006) 53(5):762–799Crossref, Google Scholar
Hannan J., Dresher M., Tucker A. W., Wolde P. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, III (1957) (Princeton University Press, Princeton, NJ) 97–139Google Scholar
Helmbold D. P., Schapire R. E., Singer Y., Warmuth M. K. On-line portfolio selection using multiplicative updates. Math. Finance (1998) 8(4):325–347Crossref, Google Scholar
Kakade S. M. On the sample complexity of reinforcement learning. (2003) . Ph.D. thesis, University College London, LondonGoogle Scholar
Kalai A., Vempala S. Efficient algorithms for on-line optimization. J. Comput. System Sci. (2005) 71(3):291–307Crossref, Google Scholar
Kearns M., Singh S. Near-optimal reinforcement learning in polynomial time. Machine Learning (2002) 49(2–3):209–232Crossref, Google Scholar
Kivinen J., Warmuth M. Additive versus exponentiated gradient updates for linear prediction. J. Inform. Comput. (1997) 132(1):1–64Crossref, Google Scholar
Littlestone N., Warmuth M. K. The weighted majority algorithm. Inform. Comput. (1994) 108(2):212–261Crossref, Google Scholar
McMahan H. Planning in the presence of cost functions controlled by an adversary. Proc. 20th Internat. Conf. Machine Learning (ICML) (2003) Washington, DC:536–543Google Scholar
McMahan H., Gordon G., Blum A. Personal communication. (2003) Google Scholar
Nilim A., El Ghaoui L. Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. (2005) 53:780–798Link, Google Scholar
Puterman M.Markov Decision Processes (1994) (Wiley-Interscience, New York) Crossref, Google Scholar
Sutton R., Barto A.Reinforcement Learning. An Introduction (1998) (MIT Press, Cambridge, MA) Crossref, Google Scholar
Tsitsiklis J. N. NP-hardness of checking the unichain condition in average cost MDPs. Oper. Res. Lett. (2007) 35(3):319–323Crossref, Google Scholar
Yu J. Y., Mannor S., Shimkin N. Markov decision processes with arbitrary reward processes. Math. Oper. Res. (2009) 34(3):737–757Link, Google Scholar

cover image Mathematics of Operations Research

Volume 34, Issue 3

August 2009

Pages 513-768

Article Information

Metrics

Information

Received:August 16, 2006
Published Online:July 22, 2009

Cite as

Eyal Even-Dar, Sham. M. Kakade, Yishay Mansour, (2009) Online Markov Decision Processes. Mathematics of Operations Research 34(3):726-736.

https://doi.org/10.1287/moor.1090.0396

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Markov Decision Processes

References

Volume 34, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News