Error Bounds for Approximations from Projected Linear Equations

Huizhen Yu
Huizhen Yu
[email protected]
Department of Computer Science, University of Helsinki, FIN-00014 Helsinki, Finland
Search for more papers by this author
,
Dimitri P. Bertsekas
Dimitri P. Bertsekas
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Huizhen Yu

[email protected]

Department of Computer Science, University of Helsinki, FIN-00014 Helsinki, Finland

Search for more papers by this author

Dimitri P. Bertsekas

[email protected]

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:14 Apr 2010https://doi.org/10.1287/moor.1100.0441

References

Bertsekas D. P. A counterexample to temporal differences learning. Neural Comput. (1995) 7:270–279Crossref, Google Scholar
Bertsekas D. P.Dynamic Programming and Optimal Control (2007) II3rd ed.(Athena Scientific, Belmont, MA) Google Scholar
Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
Bertsekas D. P., Yu H. Projected equation methods for approximate solution of large linear systems. J. Comput. Appl. Math. (2009) 227(1):27–50Crossref, Google Scholar
Boyan J. A. Least-squares temporal difference learning. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufmann, San Francisco) 49–56Google Scholar
Bradtke S. J., Barto A. G. Linear least-squares algorithms for temporal difference learning. Machine Learn. (1996) 22(2):33–57Crossref, Google Scholar
Choi D. S., Van Roy B. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynam. Systems (2006) 16(2):207–239Crossref, Google Scholar
Horn R. A., Johnson C. R.Matrix Analysis (1985) (Cambridge University Press, Cambridge, UK) Crossref, Google Scholar
Konda V. R. Actor-critic algorithms. (2002) . Thesis, Massachusetts Institute of Technology, CambridgeGoogle Scholar
Konda V. R., Tsitsiklis J. N. Actor-critic algorithms. SIAM J. Control Optim. (2003) 42(4):1143–1166Crossref, Google Scholar
Krasnose'skii M. A., Vainikko G. M., Zabreiko P. P., Rutitskii Ya. B., Stetsenko V. Ya.Approximate Solution of Operator Equations (1972) (Wolters-Noordhoff Publishing, Groningen, The Netherlands) Crossref, Google Scholar
Munos R. Error bounds for approximate policy iteration. Proc. 20th Int. Conf. Machine Learning (2003) (AUAI Press, Washington, DC) 560–567Google Scholar
Nedić A., Bertsekas D. P. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynam. Systems (2003) 13:79–110Crossref, Google Scholar
Sutton R. S. Learning to predict by the methods of temporal differences. Machine Learn. (1988) 3:9–44Crossref, Google Scholar
Sutton R. S., Barto A. G.Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
Szyld D. B. The many proofs of an identity on the norm of oblique projections. Numer. Algorithms (2006) 42(3–4):309–323Crossref, Google Scholar
Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42(5):674–690Crossref, Google Scholar
Tsitsiklis J. N., Van Roy B. Average cost temporal-difference learning. Automatica (1999a) 35(11):1799–1808Crossref, Google Scholar
Tsitsiklis J. N., Van Roy B. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Trans. Automatic Control (1999b) 44(10):1840–1851Crossref, Google Scholar
Van Roy B. On regression-based stopping times. Discrete Event Dynam. Systems (2007) . ePub ahead of print February 7, 2009, http://www.springerlink.com/content/831433v414640767/Google Scholar
Yu H., Bertsekas D. P. A least squares Q-learning algorithm for optimal stopping problems. (2006) . LIDS Technical Report 2731, Massachusetts Institute of Technology, CambridgeGoogle Scholar
Yu H., Bertsekas D. P. Q-learning algorithms for optimal stopping based on least squares. Proc. Eur. Control Conf. (2007) Kos, Greece:2368–2375Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 35, Issue 2

May 2010

Pages 257-512

Article Information

Metrics

Information

Received:August 09, 2008
Published Online:April 14, 2010

Cite as

Huizhen Yu, Dimitri P. Bertsekas, (2010) Error Bounds for Approximations from Projected Linear Equations. Mathematics of Operations Research 35(2):306-329.

https://doi.org/10.1287/moor.1100.0441

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Error Bounds for Approximations from Projected Linear Equations

References

Volume 35, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News