Performance Loss Bounds for Approximate Value Iteration with State Aggregation
Published Online:1 May 2006https://doi.org/10.1287/moor.1060.0188
References
- State aggregation in dynamic programming: An application to scheduling of independent jobs on parallel processors. Oper. Res. Lett. (1983) 2:171–176Crossref, Google Scholar
- Numerical valuation of high dimensional multivariate American securities. J. Financial Quant. Anal. (1997) 30(3):383–405Crossref, Google Scholar
- Aggregation in dynamic programming. Oper. Res. (1987) 35:215–220Link, Google Scholar
- Convergence of discretization procedures in dynamic programming. IEEE Trans. Automat. Control (1975) 20:415–419Crossref, Google Scholar
- A counterexample to temporal-difference learning. Neural Comput. (1994) 7:270–279Crossref, Google Scholar
- Adaptive aggregation for infinite horizon dynamic programming. IEEE Trans. Automat. Control (1989) 34(6):589–598Crossref, Google Scholar
- Temporal differences-based policy iteration and applications in neuro-dynamic programming. (1996) . Technical report LIDS–P–2349, MIT Laboratory for Information and Decision Systems, Cambridge, MAGoogle Scholar
- Stochastic Optimal Control: The Discrete Time Case (1996) (Athena Scientific, Belmont, MA) Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- , Si J., Barto A. G., Powell W. B., Wunsch D. C. Improved temporal difference methods with linear function approximation. Handbook of Learning and Approximate Dynamic Programming (2004) (IEEE Press and John Wiley & Sons, Boston, MA) Google Scholar
- Aggregation bounds in stochastic linear programming. Math. Programming (1985) 31:25–41Crossref, Google Scholar
- Discrete dynamic programming. Ann. Math. Statist. (1962) 33:719–726Crossref, Google Scholar
- , Bratko I., Dzeroski S. Least-squares temporal difference learning. Machine Learning: Proc. 16th Internat. Conf. (ICML) (1999) Cambridge, MAGoogle Scholar
- Technical update: Least-squares temporal difference learning. Machine Learning (2002) 49:2–3Crossref, Google Scholar
- Linear least-squares algorithms for temporal-difference learning. Machine Learning (1996) 33–57Google Scholar
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Machine Learning: Proc. 18th Internat. Conf. (2001) Palo Alto, CA(ICML)Google Scholar
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems (2006) 16(2)Google Scholar
- The complexity of dynamic programming. J. Complexity (1989) 5:466–488Crossref, Google Scholar
- An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans. Automat. Control (1991) 36(8):898–914Crossref, Google Scholar
- On the existence of fixed points for approximate value iteration and temporal-difference learning. J. Optim. Theory Appl. (2000) 105(3):589–608Crossref, Google Scholar
- Approximate dynamic programming via linear programming. Advances in Neural Information Processing Systems (2002) 14(MIT Press, Cambridge, MA) Google Scholar
- The linear programming approach to approximate dynamic programming. Oper. Res. (2003) 51(6):850–865Link, Google Scholar
- Discretizing dynamic programs. J. Optim. Theory Appl. (1973) 11:228–234Crossref, Google Scholar
- Stable function approximation in dynamic programming. (1995) . Technical report CMU-CS-95-103, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
- Stable function approximation in dynamic programming. Machine Learning: Proc. 12th Internat. Conf. (1995) San Francisco, CA(ICML)Google Scholar
- Approximate solutions to Markov decision processes. (1999) . Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
- , Puterman M. On approximate solutions of finite-stage dynamic programs. Dynamic Programming and Its Applications (1979) (Academic Press, New York) Google Scholar
- On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. (1994) 6:1185–1201Crossref, Google Scholar
- Reinforcement learning algorithms for partially observable Markov decision problems. Advances in Neural Information Processing Systems (1995) 7:345–352Google Scholar
- Numerical Methods for Stochastic Control Problems in Continuous Time (1992) (Springer-Verlag, New York) Crossref, Google Scholar
- Aggregation in stochastic dynamic programming. (2004) . Technical report 04-07, Department of Industrial and Operations Engineering, Ann Arbor, MIGoogle Scholar
- An iterative aggregation procedure for Markov decision processes. Oper. Res. (1982) 30:62–73Link, Google Scholar
- . The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning (1995) 21(3):199–233Google Scholar
- , Puterman M. Computational advances in dynamic programming. Dynamic Programming and Its Applications. (1979) (Academic Press, New York) 202–207Google Scholar
- Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynam. Systems (2003) 13:79–110Crossref, Google Scholar
- Problem solving with reinforcement learning. (1995) . Ph.D. thesis, Cambridge University, Cambridge, UKGoogle Scholar
- On-line Q-learning using connectionist systems. (1994) . Technical report CUED/F-INFENG/TR 166, Engineering Department, Cambridge UniversityGoogle Scholar
- Using randomization to break the curse of dimensionality. Econometrica (1997) 65(3):487–516Crossref, Google Scholar
- An upper-bound on the loss from approximate optimal-value functions. Machine Learning (1994) 16(3):227–233Google Scholar
- Temporal credit assignment in reinforcement learning. (1984) . Ph.D. thesis, University of Massachusetts Amherst, Amherst, MAGoogle Scholar
- Learning to predict by the methods of temporal differences. Machine Learning (1988) 3:9–44Crossref, Google Scholar
- On the virtues of linear learning and trajectory distributions. Proc. Workshop Value Function Approximation, Machine Learning Conf. (1995) Google Scholar
- Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems (1996) 8(MIT Press, Cambridge, MA) Google Scholar
- Reinforcement Learning: An Introduction (1998) (MIT Press, Cambridge, MA) Google Scholar
- Asynchronous stochastic approximation and Q-learning. Machine Learning (1994) 16:185–202Crossref, Google Scholar
- Feature-based methods for large scale dynamic programming. Machine Learning (1996) 22:59–94Google Scholar
- An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control (1997) 42(5):674–690Crossref, Google Scholar
- Analysis of temporal-difference learning with function approximation. Advances in Neural Information Processing Systems (1997) 9(MIT Press, Cambridge, MA) Google Scholar
- Average cost temporal-difference learning. Proc. IEEE Conf. Decision Control (1997) IEEESan Diego, CAGoogle Scholar
- Average cost temporal-difference learning. Automatica (1999) 35(11):1799–1808Crossref, Google Scholar
- On average versus discounted reward temporal-difference learning. Machine Learning (2002) 49:2–3Crossref, Google Scholar
- Learning and value function approximation in complex decision processes. (1998) . Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
- Learning from delayed rewards. (1989) . Ph.D. thesis, Cambridge University, Cambridge, UKGoogle Scholar
- Q-learning. Machine Learning (1992) 8:279–292Crossref, Google Scholar
- Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook (1977) 22:25–38Google Scholar
- Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks (1990) 3:179–189Crossref, Google Scholar
- Approximations of dynamic programs I. Math. Oper. Res. (1978) 3(3):231–243Link, Google Scholar

