Error Bounds for Approximations from Projected Linear Equations
Published Online:14 Apr 2010https://doi.org/10.1287/moor.1100.0441
References
- A counterexample to temporal differences learning. Neural Comput. (1995) 7:270–279Crossref, Google Scholar
- Dynamic Programming and Optimal Control (2007) II3rd ed.(Athena Scientific, Belmont, MA) Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- Projected equation methods for approximate solution of large linear systems. J. Comput. Appl. Math. (2009) 227(1):27–50Crossref, Google Scholar
- Least-squares temporal difference learning. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufmann, San Francisco) 49–56Google Scholar
- Linear least-squares algorithms for temporal difference learning. Machine Learn. (1996) 22(2):33–57Crossref, Google Scholar
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynam. Systems (2006) 16(2):207–239Crossref, Google Scholar
- Matrix Analysis (1985) (Cambridge University Press, Cambridge, UK) Crossref, Google Scholar
- Actor-critic algorithms. (2002) . Thesis, Massachusetts Institute of Technology, CambridgeGoogle Scholar
- Actor-critic algorithms. SIAM J. Control Optim. (2003) 42(4):1143–1166Crossref, Google Scholar
- Approximate Solution of Operator Equations (1972) (Wolters-Noordhoff Publishing, Groningen, The Netherlands) Crossref, Google Scholar
- Error bounds for approximate policy iteration. Proc. 20th Int. Conf. Machine Learning (2003) (AUAI Press, Washington, DC) 560–567Google Scholar
- Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynam. Systems (2003) 13:79–110Crossref, Google Scholar
- Learning to predict by the methods of temporal differences. Machine Learn. (1988) 3:9–44Crossref, Google Scholar
- Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
- The many proofs of an identity on the norm of oblique projections. Numer. Algorithms (2006) 42(3–4):309–323Crossref, Google Scholar
- An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42(5):674–690Crossref, Google Scholar
- Average cost temporal-difference learning. Automatica (1999a) 35(11):1799–1808Crossref, Google Scholar
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Trans. Automatic Control (1999b) 44(10):1840–1851Crossref, Google Scholar
- On regression-based stopping times. Discrete Event Dynam. Systems (2007) . ePub ahead of print February 7, 2009, http://www.springerlink.com/content/831433v414640767/Google Scholar
- A least squares Q-learning algorithm for optimal stopping problems. (2006) . LIDS Technical Report 2731, Massachusetts Institute of Technology, CambridgeGoogle Scholar
- Q-learning algorithms for optimal stopping based on least squares. Proc. Eur. Control Conf. (2007) Kos, Greece:2368–2375Crossref, Google Scholar

