An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Published Online:https://doi.org/10.1287/opre.2015.1425

References

  • Asamov T, Powell WB (2015) Regularized decomposition of high-dimensional multistage stochastic programs with Markov uncertainty. Working paper, Princeton University, Princeton, NJ.Google Scholar
  • Ayer M, Brunk HD, Ewing GM (1955) An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 26(4):641–647.CrossrefGoogle Scholar
  • Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression (John Wiley & Sons, New York).Google Scholar
  • Bertsekas DP (2007) Dynamic Programming and Optimal Control, Vol. II, 4 ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.CrossrefGoogle Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro–Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Birge JR (1985) Decomposition and partitioning methods for multistage stochastic linear programs. Oper. Res. 33(5):989–1007.LinkGoogle Scholar
  • Breiman L (1992) Probability (SIAM, Philadelphia, PA).CrossrefGoogle Scholar
  • Brunk HD (1955) Maximum likelihood estimates of monotone parameters. Ann. Math. Statist. 26(4):607–616.CrossrefGoogle Scholar
  • Carmona R, Ludkovski M (2010) Valuation of energy storage: An optimal switching approach. Quant. Finance 10(4):359–374.CrossrefGoogle Scholar
  • Dette H, Neumeyer N, Pilz KF (2006) A simple nonparametric estimator of a strictly monotone regression function. Bernoulli 12(3):469–490.CrossrefGoogle Scholar
  • Ekström E (2004) Properties of American option prices. Stochastic Processes Appl. 114(2):265–278.CrossrefGoogle Scholar
  • Feldstein MS, Rothschild M (1974) Towards an economic theory of replacement investment. Econometrica 42(3):393–424.CrossrefGoogle Scholar
  • George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learn. 65(1):167–198.CrossrefGoogle Scholar
  • Hsih KW (2010) Optimal dosing applied to glycemic control of type 2 diabetes. Senior thesis, Princeton University, Princeton, NJ.Google Scholar
  • Jiang DR, Powell WB (2015) Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming. INFORMS J. Comput. 27(3):525–543.LinkGoogle Scholar
  • Kaplan G, Violante GL (2014) A model of the consumption response to fiscal stimulus payments. Econometrica 82(4):1199–1239.CrossrefGoogle Scholar
  • Kim JH, Powell WB (2011) Optimal energy commitments with storage and intermittent supply. Oper. Res. 59(6):1347–1360.LinkGoogle Scholar
  • Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.CrossrefGoogle Scholar
  • Kurt M, Kharoufeh JP (2010) Monotone optimal replacement policies for a Markovian deteriorating system in a controllable environment. Oper. Res. Lett. 38(4):273–279.CrossrefGoogle Scholar
  • Kurt M, Denton BT, Schaefer AJ, Shah ND, Smith SA (2011) The structure of optimal statin initiation policies for patients with type 2 diabetes. IIE Trans. Healthcare Systems Engrg. 1(1):49–65.CrossrefGoogle Scholar
  • Luenberger DG (1998) Investment Science (Oxford University Press, New York).Google Scholar
  • Mammen E (1991) Estimating a smooth monotone regression function. Ann. Statist. 19(2):724–740.CrossrefGoogle Scholar
  • Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.CrossrefGoogle Scholar
  • Mason JE, England DA, Denton BT, Smith SA, Kurt M, Shah ND (2012) Optimizing statin treatment decisions for diabetes patients in the presence of uncertain future adherence. Medical Decision Making 32(1):154–166.CrossrefGoogle Scholar
  • McCall JJ (1970) Economics of information and job search. Quart. J. Econom. 84(1):113–126.CrossrefGoogle Scholar
  • Mukerjee H (1988) Monotone nonparametric regression. Ann. Statist. 16(2):741–750.CrossrefGoogle Scholar
  • Müller A (1997) How does the value function of a Markov decision process depend on the transition probabilities? Math. Oper. Res. 22(4):872–885.LinkGoogle Scholar
  • Nadaraya EA (1964) On estimating regression. Theory Probab. Appl. 9(1):141–142.CrossrefGoogle Scholar
  • Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.LinkGoogle Scholar
  • Nascimento JM, Powell WB (2010) Dynamic programming models and algorithms for the mutual fund cash balance problem. Management Sci. 56(5):801–815.LinkGoogle Scholar
  • Ormoneit D, Sen A (2002) Kernel-based reinforcement learning. Machine Learn. 49(2–3):161–178.CrossrefGoogle Scholar
  • Papadaki KP, Powell WB (2002) Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. Eur. J. Oper. Res. 142(1):108–127.CrossrefGoogle Scholar
  • Papadaki KP, Powell WB (2003a) A discrete online monotone estimation algorithm. Working paper LSEOR 03.73, London School of Economics, London.Google Scholar
  • Papadaki KP, Powell WB (2003b) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res. Logist. 50(7):742–769.CrossrefGoogle Scholar
  • Pereira MVF, Pinto LMVG (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1–3):359–375.CrossrefGoogle Scholar
  • Pierskalla WP, Voelker JA (1976) A survey of maintenance models: The control and surveillance of deteriorating systems. Naval Res. Logist. Quart. 23(3):353–388.CrossrefGoogle Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality 2nd ed. (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Powell WB, Ruszczyński A, Topaloglu H (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.LinkGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).CrossrefGoogle Scholar
  • Ramsay JO (1998) Estimating smooth monotone functions. J. Roy. Statist. Soc.: Ser. B (Statist. Methodology) 60(2):365–375.CrossrefGoogle Scholar
  • Ross SM (1983) Introduction to Stochastic Dynamic Programming (Academic Press, New York).Google Scholar
  • Rust J (1987) Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: J. Econometric Soc. 55(5):999–1033.CrossrefGoogle Scholar
  • Salas D, Powell WB (2013) Benchmarking a scalable approximation dynamic programming algorithm for stochastic control of multidimensional energy storage problems. Working paper, Princeton University, Princeton, NJ.Google Scholar
  • Scott DW (2009) Multivariate Density Estimation: Theory, Practice, and Visualization, Vol. 383 (John Wiley & Sons, New York).Google Scholar
  • Scott WR, Powell WB, Moazeni S (2012) Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. Working paper, Princeton University, Princeton, NJ.Google Scholar
  • Secomandi N (2010) Optimal commodity trading with a capacitated storage asset. Management Sci. 56(3):449–467.LinkGoogle Scholar
  • Smith JE, McCardle KF (2002) Structural properties of stochastic dynamic programs. Oper. Res. 50(5):796–809.LinkGoogle Scholar
  • Stockey N, Lucas Jr RE (1989) Recursive Methods in Economic Dynamics (Harvard University Press, Cambridge, MA).CrossrefGoogle Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Topaloglu H, Powell WB (2003) An algorithm for approximating piecewise linear concave functions from sample gradients. Oper. Res. Lett. 31(1):66–76.CrossrefGoogle Scholar
  • Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.CrossrefGoogle Scholar
  • Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.