The Linear Programming Approach to Approximate Dynamic Programming

References

  • Bertsekas D.Dynamic Programming and Optimal Control (1995) (Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D., Gamarnik D., Tsitsiklis J. Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. (2001) 11(4):1384–1428CrossrefGoogle Scholar
  • Bishop C. M.Neural Networks for Pattern Recognition (1995) (Oxford University Press, New York) CrossrefGoogle Scholar
  • Borkar V. A convex analytic approach to Markov decision processes. Probab. Theory Related Fields (1988) 78:583–602CrossrefGoogle Scholar
  • Chen R.-R., Meyn S. Value iteration and optimization of multiclass queueing networks. Queueing Systems (1999) 32:65–97CrossrefGoogle Scholar
  • Chen V. C. P., Ruppert D., Shoemaker C. A. Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. (1999) 47(1):38–53LinkGoogle Scholar
  • Crites R. H., Barto A. G. Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems (1996) 8(MIT Press, Cambridge, MA) 1017–1023Google Scholar
  • Dayan P. The convergence of TD(λ) for general λ. Machine Learning (1992) 8:341–362CrossrefGoogle Scholar
  • de Farias D. P., Van Roy B. On the existence of fixed points for appproximate value iteration and temporal-difference learning. J. Optim. Theory Appl. (2000) 105(3):589–608CrossrefGoogle Scholar
  • de Farias D. P., Van Roy B. On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. (2001) . Conditionally accepted toGoogle Scholar
  • De Ghellinck G. Les problèmes de décisions séquentielles. Cahiers du Centre d'Etudes de Recherche Opérationnelle (1960) 2:161–179Google Scholar
  • Denardo E. V. On linear programming in a Markov decision problem. Management Sci. (1970) 16(5):282–288LinkGoogle Scholar
  • D'Epenoux F. A probabilistic production and inventory problem. Management Sci. (1963) 10(1):98–108LinkGoogle Scholar
  • Gordon G. Approximate solutions to Markov decision processess. (1999) . Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
  • Grötschel M., Holland O. Solution of large-scale symmetric travelling salesman problems. Math. Programming (1991) 51:141–202CrossrefGoogle Scholar
  • Guestrin C., Koller D., Parr R. Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Res. (2002) . ForthcomingGoogle Scholar
  • Haykin S.Neural Networks: A Comprehensive Formulation (1994) (Macmillan, New York) Google Scholar
  • Hordijk A., Kallenberg L. C. M. Linear programming and Markov decision chains. Management Sci. (1979) 25:352–362LinkGoogle Scholar
  • Kumar P. R., Seidman T. I. Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems. IEEE Trans. Automatic Control (1990) 35(3):289–298CrossrefGoogle Scholar
  • Longstaff F., Schwartz E. S. Valuing American options by simulation: A simple least squares approach. Rev. Financial Stud. (2001) 14:113–147CrossrefGoogle Scholar
  • Manne A. S. Linear programming and sequential decisions. Management Sci. (1960) 6(3):259–267LinkGoogle Scholar
  • Morrison J. R., Kumar P. R. New linear program performance bounds for queueing networks. J. Optim. Theory Appl. (1999) 100(3):575–597CrossrefGoogle Scholar
  • Paschalidis I. C., Tsitsiklis J. N. Congestion-dependent pricing of network services. IEEE/ACM Trans. Networking (2000) 8(2):171–184CrossrefGoogle Scholar
  • Rybko A. N., Stolyar A. L. On the ergodicity of stochastic processes describing the operation of open queueing networks. Problemy Peredachi Informatsii (1992) 28:3–26Google Scholar
  • Schuurmans D., Patrascu R. Direct value-approximation for factored MDPs. Advances in Neural Information Processing Systems (2001) 14(MIT Press, Cambridge, MA) 1579–1586Google Scholar
  • Schweitzer P., Seidmann A. Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. (1985) 110:568–582CrossrefGoogle Scholar
  • Sutton R. S. Learning to predict by the methods of temporal differences. Machine Learning (1988) 3:9–44CrossrefGoogle Scholar
  • Sutton R. S., Barto A. G.Reinforcement Learning: An Introduction (1998) (MIT Press, Cambridge, MA) Google Scholar
  • Tesauro C. J. Temporal difference learning and TD-gammon. Comm. ACM (1995) 38:58–68CrossrefGoogle Scholar
  • Trick M., Zin S. A linear programming approach to solving dynamic programs. (1993) . Unpublished manuscriptGoogle Scholar
  • Trick M., Zin S. Spline approximations to value functions: A linear programming approach. Macroeconomic Dynamics (1997) 1:255–277CrossrefGoogle Scholar
  • Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Auto. Control (1997) 42(5):674–690CrossrefGoogle Scholar
  • Tsitsiklis J. N., Van Roy B. Regression methods for pricing complex American-style options. IEEE Trans. Neural Networks (2001) 12(4):694–703CrossrefGoogle Scholar
  • Van Roy B. Learning and value function approximation in complex decision processes. (1998) . Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
  • Van Roy B., Feinberg E., Schwartz A. Neuro-dynamic programming: Overview and recent trends. Markov Decision Processes: Models, Methods, Directions, and Open Problems (2000) (Kluwer, Norwell, MA) Google Scholar
  • Zhang W., Dietterich T. G. High-performance job-shop scheduling with a time-delay TD(λ) network. Advances in Neural Information Processing Systems (1996) 8(MIT Press, Cambridge, MA) 1024–1030Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.