Markov Decision Processes with Arbitrary Reward Processes

Published Online:https://doi.org/10.1287/moor.1090.0397

References

  • Auer P., Cesa-Bianchi N., Freund Y., Schapire R. E. The nonstochastic multiarmed bandit problem. SIAM J. Comput. (2002) 32(1):48–77CrossrefGoogle Scholar
  • Aumann R. J. Markets with a continuum of traders. Econometrica (1964) 32:39–50CrossrefGoogle Scholar
  • Bertsekas D. P.Dynamic Programming and Optimal Control (2001) 22nd ed.(Athena Scientific, Nashua, NH) Google Scholar
  • Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Nashua, NH) Google Scholar
  • Blackwell D. An analog of the minimax theorem for vector payoffs. Pacific J. Math. (1956) 6(1):1–8CrossrefGoogle Scholar
  • Bobkov S. G., Tetali P. Modified logarithmic Sobolev inequalities in discrete settings. J. Theoret. Probab. (2006) 19(2):289–336CrossrefGoogle Scholar
  • Borkar V. S., Meyn S. P. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. (2000) 38(2):447–469CrossrefGoogle Scholar
  • Brafman R. I., Tennenholtz M. R-max—A general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learning Res. (2003) 3:213–231Google Scholar
  • Cesa-Bianchi N., Lugosi G.Prediction, Learning, and Games (2006) (Cambridge University Press, New York) CrossrefGoogle Scholar
  • Crites R. H., Barto A. G. An actor/critic algorithm that is equivalent to Q-learning. Advances in Neural Information Processing Systems 7 (1995) (MIT Press, Cambridge) 401–408Google Scholar
  • Even-Dar E., Kakade S., Mansour Y. Experts in a Markov decision process. Advances in Neural Information Processing Systems 17 (2004) (MIT Press, Cambridge) 401–408Google Scholar
  • Filar J., Vrieze K.Competitive Markov Decision Processes (1997) (Springer-Verlag, New York) Google Scholar
  • Freund Y., Schapire R. E. Adaptive game playing using multiplicative weights. Games Econom. Behav. (1999) 29(12):79–103CrossrefGoogle Scholar
  • Fudenberg D., Kreps D. M. Learning mixed equilibria. Games Econom. Behav. (1993) 5(3):320–367CrossrefGoogle Scholar
  • Fudenberg D., Levine D. K.The Theory of Learning in Games (1998) (MIT Press, Cambridge) Google Scholar
  • Hannan J. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games (1957) 3(Princeton University Press, Princeton, NJ) 97–139Google Scholar
  • Herbster M., Warmuth M. K. Tracking the best expert. Machine Learning (1998) 32(2):151–178CrossrefGoogle Scholar
  • Kalai A., Vempala S. Efficient algorithms for online decision problems. J. Comput. System Sci. (2005) 71(3):291–307CrossrefGoogle Scholar
  • Littlestone N., Warmuth M. K. The weighted majority algorithm. Inform. Comput. (1994) 108(2):212–261CrossrefGoogle Scholar
  • Mannor S., Shimkin N. The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math. Oper. Res. (2003) 28(2):327–345LinkGoogle Scholar
  • Mannor S., Shimkin N. Regret minimization in repeated matrix games with variable stage duration. Games Econom. Behav. (2008) 63(1):227–258CrossrefGoogle Scholar
  • Merhav N., Ordentlich E., Seroussi G., Weinberger M. J. On sequential strategies for loss functions with memory. IEEE Trans. Inform. Theory (2002) 48(7):1947–1958CrossrefGoogle Scholar
  • Renegar J. Some perturbation theory for linear programming. Math. Programming (1994) 65(1):73–91CrossrefGoogle Scholar
  • Robinson S. M. Bounds for error in the solution set of a perturbed linear program. Linear Algebra Its Appl. (1973) 6:69–81CrossrefGoogle Scholar
  • Schweitzer P. J. Perturbation theory and finite Markov chains. J. Appl. Probab. (1968) 5:410–413CrossrefGoogle Scholar
  • Shapley L. Stochastic games. Proc. National Acad. Sci. (1953) 39(10):1095–1100CrossrefGoogle Scholar
  • Watkins C., Dayan P. Q-learning. Machine Learning (1992) 8:279–292CrossrefGoogle Scholar
  • Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. Proc. Twentieth Internat. Conf. Machine Learning (2003) (AAAI Press, Cambridge, MA) . http://www.hpl.hp.com/conferences/icml2003/titlesAndAuthors.htmlGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.