Error Bounds for Approximations from Projected Linear Equations

Published Online:https://doi.org/10.1287/moor.1100.0441

References

  • Bertsekas D. P. A counterexample to temporal differences learning. Neural Comput. (1995) 7:270–279CrossrefGoogle Scholar
  • Bertsekas D. P.Dynamic Programming and Optimal Control (2007) II3rd ed.(Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D. P., Yu H. Projected equation methods for approximate solution of large linear systems. J. Comput. Appl. Math. (2009) 227(1):27–50CrossrefGoogle Scholar
  • Boyan J. A. Least-squares temporal difference learning. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufmann, San Francisco) 49–56Google Scholar
  • Bradtke S. J., Barto A. G. Linear least-squares algorithms for temporal difference learning. Machine Learn. (1996) 22(2):33–57CrossrefGoogle Scholar
  • Choi D. S., Van Roy B. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynam. Systems (2006) 16(2):207–239CrossrefGoogle Scholar
  • Horn R. A., Johnson C. R.Matrix Analysis (1985) (Cambridge University Press, Cambridge, UK) CrossrefGoogle Scholar
  • Konda V. R. Actor-critic algorithms. (2002) . Thesis, Massachusetts Institute of Technology, CambridgeGoogle Scholar
  • Konda V. R., Tsitsiklis J. N. Actor-critic algorithms. SIAM J. Control Optim. (2003) 42(4):1143–1166CrossrefGoogle Scholar
  • Krasnose'skii M. A., Vainikko G. M., Zabreiko P. P., Rutitskii Ya. B., Stetsenko V. Ya.Approximate Solution of Operator Equations (1972) (Wolters-Noordhoff Publishing, Groningen, The Netherlands) CrossrefGoogle Scholar
  • Munos R. Error bounds for approximate policy iteration. Proc. 20th Int. Conf. Machine Learning (2003) (AUAI Press, Washington, DC) 560–567Google Scholar
  • Nedić A., Bertsekas D. P. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynam. Systems (2003) 13:79–110CrossrefGoogle Scholar
  • Sutton R. S. Learning to predict by the methods of temporal differences. Machine Learn. (1988) 3:9–44CrossrefGoogle Scholar
  • Sutton R. S., Barto A. G.Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
  • Szyld D. B. The many proofs of an identity on the norm of oblique projections. Numer. Algorithms (2006) 42(3–4):309–323CrossrefGoogle Scholar
  • Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42(5):674–690CrossrefGoogle Scholar
  • Tsitsiklis J. N., Van Roy B. Average cost temporal-difference learning. Automatica (1999a) 35(11):1799–1808CrossrefGoogle Scholar
  • Tsitsiklis J. N., Van Roy B. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Trans. Automatic Control (1999b) 44(10):1840–1851CrossrefGoogle Scholar
  • Van Roy B. On regression-based stopping times. Discrete Event Dynam. Systems (2007) . ePub ahead of print February 7, 2009, http://www.springerlink.com/content/831433v414640767/Google Scholar
  • Yu H., Bertsekas D. P. A least squares Q-learning algorithm for optimal stopping problems. (2006) . LIDS Technical Report 2731, Massachusetts Institute of Technology, CambridgeGoogle Scholar
  • Yu H., Bertsekas D. P. Q-learning algorithms for optimal stopping based on least squares. Proc. Eur. Control Conf. (2007) Kos, Greece:2368–2375CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.