Online Markov Decision Processes

Published Online:https://doi.org/10.1287/moor.1090.0396

References

  • Auer P., Cesa-Bianchi N., Gentile C. Adaptive and self-confident on-line learning algorithms. J. Comput. System Sci. (2002) 64:48–75CrossrefGoogle Scholar
  • Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
  • Blum A., Kalai A. Universal portfolios with and without transaction costs. Machine Learning (1999) 35:193–205CrossrefGoogle Scholar
  • Borodin A., El-Yaniv R.Online Computation and Competitive Analysis (1998) (Cambridge University Press, Cambridge, UK) Google Scholar
  • Cesa-Bianchi N., Freund Y., Helmbold D. P., Haussler D., Schapire R. E., Warmuth M. K. How to use expert advice. J. ACM (1997) 44(3):427–485CrossrefGoogle Scholar
  • de Farias D. P., Megiddo N. Combining expert advice in reactive environments. J. ACM (2006) 53(5):762–799CrossrefGoogle Scholar
  • Hannan J., Dresher M., Tucker A. W., Wolde P. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, III (1957) (Princeton University Press, Princeton, NJ) 97–139Google Scholar
  • Helmbold D. P., Schapire R. E., Singer Y., Warmuth M. K. On-line portfolio selection using multiplicative updates. Math. Finance (1998) 8(4):325–347CrossrefGoogle Scholar
  • Kakade S. M. On the sample complexity of reinforcement learning. (2003) . Ph.D. thesis, University College London, LondonGoogle Scholar
  • Kalai A., Vempala S. Efficient algorithms for on-line optimization. J. Comput. System Sci. (2005) 71(3):291–307CrossrefGoogle Scholar
  • Kearns M., Singh S. Near-optimal reinforcement learning in polynomial time. Machine Learning (2002) 49(2–3):209–232CrossrefGoogle Scholar
  • Kivinen J., Warmuth M. Additive versus exponentiated gradient updates for linear prediction. J. Inform. Comput. (1997) 132(1):1–64CrossrefGoogle Scholar
  • Littlestone N., Warmuth M. K. The weighted majority algorithm. Inform. Comput. (1994) 108(2):212–261CrossrefGoogle Scholar
  • McMahan H. Planning in the presence of cost functions controlled by an adversary. Proc. 20th Internat. Conf. Machine Learning (ICML) (2003) Washington, DC:536–543Google Scholar
  • McMahan H., Gordon G., Blum A. Personal communication. (2003) Google Scholar
  • Nilim A., El Ghaoui L. Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. (2005) 53:780–798LinkGoogle Scholar
  • Puterman M.Markov Decision Processes (1994) (Wiley-Interscience, New York) CrossrefGoogle Scholar
  • Sutton R., Barto A.Reinforcement Learning. An Introduction (1998) (MIT Press, Cambridge, MA) CrossrefGoogle Scholar
  • Tsitsiklis J. N. NP-hardness of checking the unichain condition in average cost MDPs. Oper. Res. Lett. (2007) 35(3):319–323CrossrefGoogle Scholar
  • Yu J. Y., Mannor S., Shimkin N. Markov decision processes with arbitrary reward processes. Math. Oper. Res. (2009) 34(3):737–757LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.