Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization
Published Online:2 Mar 2017https://doi.org/10.1287/moor.2016.0826
References
- (2011) Regret bounds for the adaptive control of linear quadratic systems. J. Machine Learn. Res.—Proc. Track 19:1–26.Google Scholar
- (2006) Logarithmic online regret bounds for undiscounted reinforcement learning. Schölkopf PB, Platt JC, Hoffman T, eds. Adv. Neural Inform. Processing Systems (NIPS 2006), Vol. 19 (MIT Press, Cambridge, MA), 49–56.Google Scholar
- (2013) Regret bounds for reinforcement learning with policy advice. Daelemans W, Morik K, eds. Machine Learning and Knowledge Discovery in Databases (Springer, Berlin), 97–112.Crossref, Google Scholar
- (2009) REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. Bilmes JA, Ng AY, eds. Proc. 25th Conf. Uncertainty Artificial Intelligence (UAI2009) (AUAI Press, Arlington, VA), 35–42.Google Scholar
- (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (2002) R-max—A general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learn. Res. 3(October):213–231.Google Scholar
- (1995) Online fitted reinforcement learning. Touretzky DS, Mozer MC, Hasselmo ME, eds. Adv. Neural Inform. Processing Systems (NIPS 1995), Vol. 8 (MIT Press, Cambridge, MA), 1052–1058.Google Scholar
- (2012) Efficient reinforcement learning for high dimensional linear quadratic systems. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2012), Vol. 25 (Curran Associates, Red Hook, NY), 2636–2644.Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(April):1563–1600.Google Scholar
- (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University College London, London.Google Scholar
- (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4):373–396.Crossref, Google Scholar
- (1999) Efficient reinforcement learning in factored MDPs. Proc. 16th Internat. Joint Conf. Artificial Intelligence, Vol. 2 (Morgan Kaufmann, San Francisco), 740–747.Google Scholar
- (2002) Near-optimal reinforcement learning in polynomial time. Machine Learn. 49(2–3):209–232.Crossref, Google Scholar
- (2002) Least-squares methods in reinforcement learning for control. Vlahavas IP, Spyropoulos CD, eds. Methods and Applications of Artificial Intelligence (Springer, Berlin), 249–260.Crossref, Google Scholar
- (2013) The sample-complexity of general reinforcement learning. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learning, 28–36.Google Scholar
- (2010) Reducing reinforcement learning to KWIK online regression. Ann. Math. Artificial Intelligence 58(2):319–325.Google Scholar
- (2011) Knows what it knows: A framework for self-aware learning. Machine Learn. 82(3):399–443.Crossref, Google Scholar
- (2012) The adversarial stochastic shortest path problem with unknown transition probabilities. Lawrence N, Girolami M, eds. Proc. 15th Internat. Conf. Artificial Intelligence Statist., 805–813.Google Scholar
- (2010) Online Markov decision processes under bandit feedback. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems (NIPS 2010), Vol. 23 (Curran Associates, Red Hook, NY), 1804–1812.Google Scholar
- (2012) Online regret bounds for undiscounted continuous reinforcement learning. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2012), Vol. 25 (Curran Associates, Red Hook, NY), 1763–1771.Google Scholar
- (2013) (More) efficient reinforcement learning via posterior sampling. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2013), Vol. 26 (Curran Associates, Red Hook, NY), 3003–3011.Google Scholar
- (2013) PAC optimal exploration in continuous space Markov decision processes. desJardins M, Littman ML, eds. Proc. 27th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA).Google Scholar
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2011) Optimal Learning (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University, Cambridge, UK.Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2010) Approximate dynamic programming with correlated Bayesian beliefs. Viswanath P, Meyn S, eds. Proc. 2010 48th Annual Allerton Conf. Comm., Control, Comput. (Allerton) (IEEE, Piscataway, NJ), 1360–1367.Crossref, Google Scholar
- (1995) Reinforcement learning with soft state aggregation. Tesauro G, Touretzky DS, Leen TK, eds. Adv. Neural Inform. Processing Systems (NIPS 1994), Vol. 7 (MIT Press, Cambridge, MA), 361–368.Google Scholar
- (2007) Probably approximately correct (PAC) exploration in reinforcement learning. Ph.D. thesis, Rutgers, The State University of New Jersey, New Brunswick.Google Scholar
- (2006) PAC model-free reinforcement learning. Cohen W, Moore A, eds. Proc. 23rd Internat. Conf. Machine Learn. (ACM, New York), 881–888.Crossref, Google Scholar
- (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan & Claypool Publishers, San Rafael, CA).Google Scholar
- (1996) Feature-based methods for large scale dynamic programming. Machine Learn. 22(1–3):59–94.Crossref, Google Scholar
- (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
- (2016) Generalization and exploration via randomized value functions. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., 2377–2386.Google Scholar
- (1996) An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Trans. Fuzzy Systems 4(1):14–23.Crossref, Google Scholar
- (2013) Efficient exploration and value function generalization in deterministic systems. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2013), Vol. 26 (Curran Associates, Red Hook, NY), 3021–3029.Google Scholar
- (1991) Complexity and cooperation in Q-learning. Birnbaum LA, Collins GC, eds. Proc. 8th Internat. Workshop Machine Learn. (Morgan Kaufmann, San Francisco), 363–367.Crossref, Google Scholar

