Online Markov Decision Processes

Eyal Even-Dar
Eyal Even-Dar
[email protected]
Google Research, New York, New York 10011
Search for more papers by this author
,
Sham. M. Kakade
Sham. M. Kakade
[email protected]
Toyota Technological Institute, Chicago, Illinois 60637
Search for more papers by this author
,
Yishay Mansour
Yishay Mansour
[email protected]
School of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel
Search for more papers by this author

Eyal Even-Dar

[email protected]

Google Research, New York, New York 10011

Search for more papers by this author

Sham. M. Kakade

[email protected]

Toyota Technological Institute, Chicago, Illinois 60637

Search for more papers by this author

Yishay Mansour

[email protected]

School of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel

Search for more papers by this author

Published Online:22 Jul 2009https://doi.org/10.1287/moor.1090.0396

Abstract

We consider a Markov decision process (MDP) setting in which the reward function is allowed to change after each time step (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well an agent can do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions.

cover image Mathematics of Operations Research

Volume 34, Issue 3

August 2009

Pages 513-768

Article Information

Metrics

Information

Received:August 16, 2006
Published Online:July 22, 2009

Cite as

Eyal Even-Dar, Sham. M. Kakade, Yishay Mansour, (2009) Online Markov Decision Processes. Mathematics of Operations Research 34(3):726-736.

https://doi.org/10.1287/moor.1090.0396

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Markov Decision Processes

Abstract

Volume 34, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News