Markov Decision Processes with Arbitrary Reward Processes
Published Online:6 Aug 2009https://doi.org/10.1287/moor.1090.0397
References
- The nonstochastic multiarmed bandit problem. SIAM J. Comput. (2002) 32(1):48–77Crossref, Google Scholar
- Markets with a continuum of traders. Econometrica (1964) 32:39–50Crossref, Google Scholar
- Dynamic Programming and Optimal Control (2001) 22nd ed.(Athena Scientific, Nashua, NH) Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Nashua, NH) Google Scholar
- An analog of the minimax theorem for vector payoffs. Pacific J. Math. (1956) 6(1):1–8Crossref, Google Scholar
- Modified logarithmic Sobolev inequalities in discrete settings. J. Theoret. Probab. (2006) 19(2):289–336Crossref, Google Scholar
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. (2000) 38(2):447–469Crossref, Google Scholar
- R-max—A general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learning Res. (2003) 3:213–231Google Scholar
- Prediction, Learning, and Games (2006) (Cambridge University Press, New York) Crossref, Google Scholar
- An actor/critic algorithm that is equivalent to Q-learning. Advances in Neural Information Processing Systems 7 (1995) (MIT Press, Cambridge) 401–408Google Scholar
- Experts in a Markov decision process. Advances in Neural Information Processing Systems 17 (2004) (MIT Press, Cambridge) 401–408Google Scholar
- Competitive Markov Decision Processes (1997) (Springer-Verlag, New York) Google Scholar
- Adaptive game playing using multiplicative weights. Games Econom. Behav. (1999) 29(12):79–103Crossref, Google Scholar
- Learning mixed equilibria. Games Econom. Behav. (1993) 5(3):320–367Crossref, Google Scholar
- The Theory of Learning in Games (1998) (MIT Press, Cambridge) Google Scholar
- Approximation to Bayes risk in repeated play. Contributions to the Theory of Games (1957) 3(Princeton University Press, Princeton, NJ) 97–139Google Scholar
- Tracking the best expert. Machine Learning (1998) 32(2):151–178Crossref, Google Scholar
- Efficient algorithms for online decision problems. J. Comput. System Sci. (2005) 71(3):291–307Crossref, Google Scholar
- The weighted majority algorithm. Inform. Comput. (1994) 108(2):212–261Crossref, Google Scholar
- The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Math. Oper. Res. (2003) 28(2):327–345Link, Google Scholar
- Regret minimization in repeated matrix games with variable stage duration. Games Econom. Behav. (2008) 63(1):227–258Crossref, Google Scholar
- On sequential strategies for loss functions with memory. IEEE Trans. Inform. Theory (2002) 48(7):1947–1958Crossref, Google Scholar
- Some perturbation theory for linear programming. Math. Programming (1994) 65(1):73–91Crossref, Google Scholar
- Bounds for error in the solution set of a perturbed linear program. Linear Algebra Its Appl. (1973) 6:69–81Crossref, Google Scholar
- Perturbation theory and finite Markov chains. J. Appl. Probab. (1968) 5:410–413Crossref, Google Scholar
- Stochastic games. Proc. National Acad. Sci. (1953) 39(10):1095–1100Crossref, Google Scholar
- Q-learning. Machine Learning (1992) 8:279–292Crossref, Google Scholar
- Online convex programming and generalized infinitesimal gradient ascent. Proc. Twentieth Internat. Conf. Machine Learning (2003) (AAAI Press, Cambridge, MA) . http://www.hpl.hp.com/conferences/icml2003/titlesAndAuthors.htmlGoogle Scholar

