Online Markov Decision Processes
Published Online:22 Jul 2009https://doi.org/10.1287/moor.1090.0396
References
- Adaptive and self-confident on-line learning algorithms. J. Comput. System Sci. (2002) 64:48–75Crossref, Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- Universal portfolios with and without transaction costs. Machine Learning (1999) 35:193–205Crossref, Google Scholar
- Online Computation and Competitive Analysis (1998) (Cambridge University Press, Cambridge, UK) Google Scholar
- How to use expert advice. J. ACM (1997) 44(3):427–485Crossref, Google Scholar
- Combining expert advice in reactive environments. J. ACM (2006) 53(5):762–799Crossref, Google Scholar
- , Dresher M., Tucker A. W., Wolde P. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, III (1957) (Princeton University Press, Princeton, NJ) 97–139Google Scholar
- On-line portfolio selection using multiplicative updates. Math. Finance (1998) 8(4):325–347Crossref, Google Scholar
- On the sample complexity of reinforcement learning. (2003) . Ph.D. thesis, University College London, LondonGoogle Scholar
- Efficient algorithms for on-line optimization. J. Comput. System Sci. (2005) 71(3):291–307Crossref, Google Scholar
- Near-optimal reinforcement learning in polynomial time. Machine Learning (2002) 49(2–3):209–232Crossref, Google Scholar
- Additive versus exponentiated gradient updates for linear prediction. J. Inform. Comput. (1997) 132(1):1–64Crossref, Google Scholar
- The weighted majority algorithm. Inform. Comput. (1994) 108(2):212–261Crossref, Google Scholar
- Planning in the presence of cost functions controlled by an adversary. Proc. 20th Internat. Conf. Machine Learning (ICML) (2003) Washington, DC:536–543Google Scholar
- Personal communication. (2003) Google Scholar
- Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. (2005) 53:780–798Link, Google Scholar
- Markov Decision Processes (1994) (Wiley-Interscience, New York) Crossref, Google Scholar
- Reinforcement Learning. An Introduction (1998) (MIT Press, Cambridge, MA) Crossref, Google Scholar
- NP-hardness of checking the unichain condition in average cost MDPs. Oper. Res. Lett. (2007) 35(3):319–323Crossref, Google Scholar
- Markov decision processes with arbitrary reward processes. Math. Oper. Res. (2009) 34(3):737–757Link, Google Scholar

