An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Daniel R. Jiang
Daniel R. Jiang
[email protected]
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author
,
Warren B. Powell
Warren B. Powell
[email protected]
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Daniel R. Jiang

[email protected]

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Warren B. Powell

[email protected]

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:4 Nov 2015https://doi.org/10.1287/opre.2015.1425

References

Asamov T, Powell WB (2015) Regularized decomposition of high-dimensional multistage stochastic programs with Markov uncertainty. Working paper, Princeton University, Princeton, NJ.Google Scholar
Ayer M, Brunk HD, Ewing GM (1955) An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 26(4):641–647.Crossref, Google Scholar
Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression (John Wiley & Sons, New York).Google Scholar
Bertsekas DP (2007) Dynamic Programming and Optimal Control, Vol. II, 4 ed. (Athena Scientific, Belmont, MA).Google Scholar
Bertsekas DP (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.Crossref, Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro–Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
Birge JR (1985) Decomposition and partitioning methods for multistage stochastic linear programs. Oper. Res. 33(5):989–1007.Link, Google Scholar
Breiman L (1992) Probability (SIAM, Philadelphia, PA).Crossref, Google Scholar
Brunk HD (1955) Maximum likelihood estimates of monotone parameters. Ann. Math. Statist. 26(4):607–616.Crossref, Google Scholar
Carmona R, Ludkovski M (2010) Valuation of energy storage: An optimal switching approach. Quant. Finance 10(4):359–374.Crossref, Google Scholar
Dette H, Neumeyer N, Pilz KF (2006) A simple nonparametric estimator of a strictly monotone regression function. Bernoulli 12(3):469–490.Crossref, Google Scholar
Ekström E (2004) Properties of American option prices. Stochastic Processes Appl. 114(2):265–278.Crossref, Google Scholar
Feldstein MS, Rothschild M (1974) Towards an economic theory of replacement investment. Econometrica 42(3):393–424.Crossref, Google Scholar
George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learn. 65(1):167–198.Crossref, Google Scholar
Hsih KW (2010) Optimal dosing applied to glycemic control of type 2 diabetes. Senior thesis, Princeton University, Princeton, NJ.Google Scholar
Jiang DR, Powell WB (2015) Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming. INFORMS J. Comput. 27(3):525–543.Link, Google Scholar
Kaplan G, Violante GL (2014) A model of the consumption response to fiscal stimulus payments. Econometrica 82(4):1199–1239.Crossref, Google Scholar
Kim JH, Powell WB (2011) Optimal energy commitments with storage and intermittent supply. Oper. Res. 59(6):1347–1360.Link, Google Scholar
Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.Crossref, Google Scholar
Kurt M, Kharoufeh JP (2010) Monotone optimal replacement policies for a Markovian deteriorating system in a controllable environment. Oper. Res. Lett. 38(4):273–279.Crossref, Google Scholar
Kurt M, Denton BT, Schaefer AJ, Shah ND, Smith SA (2011) The structure of optimal statin initiation policies for patients with type 2 diabetes. IIE Trans. Healthcare Systems Engrg. 1(1):49–65.Crossref, Google Scholar
Luenberger DG (1998) Investment Science (Oxford University Press, New York).Google Scholar
Mammen E (1991) Estimating a smooth monotone regression function. Ann. Statist. 19(2):724–740.Crossref, Google Scholar
Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
Mason JE, England DA, Denton BT, Smith SA, Kurt M, Shah ND (2012) Optimizing statin treatment decisions for diabetes patients in the presence of uncertain future adherence. Medical Decision Making 32(1):154–166.Crossref, Google Scholar
McCall JJ (1970) Economics of information and job search. Quart. J. Econom. 84(1):113–126.Crossref, Google Scholar
Mukerjee H (1988) Monotone nonparametric regression. Ann. Statist. 16(2):741–750.Crossref, Google Scholar
Müller A (1997) How does the value function of a Markov decision process depend on the transition probabilities? Math. Oper. Res. 22(4):872–885.Link, Google Scholar
Nadaraya EA (1964) On estimating regression. Theory Probab. Appl. 9(1):141–142.Crossref, Google Scholar
Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.Link, Google Scholar
Nascimento JM, Powell WB (2010) Dynamic programming models and algorithms for the mutual fund cash balance problem. Management Sci. 56(5):801–815.Link, Google Scholar
Ormoneit D, Sen A (2002) Kernel-based reinforcement learning. Machine Learn. 49(2–3):161–178.Crossref, Google Scholar
Papadaki KP, Powell WB (2002) Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. Eur. J. Oper. Res. 142(1):108–127.Crossref, Google Scholar
Papadaki KP, Powell WB (2003a) A discrete online monotone estimation algorithm. Working paper LSEOR 03.73, London School of Economics, London.Google Scholar
Papadaki KP, Powell WB (2003b) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res. Logist. 50(7):742–769.Crossref, Google Scholar
Pereira MVF, Pinto LMVG (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1–3):359–375.Crossref, Google Scholar
Pierskalla WP, Voelker JA (1976) A survey of maintenance models: The control and surveillance of deteriorating systems. Naval Res. Logist. Quart. 23(3):353–388.Crossref, Google Scholar
Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality 2nd ed. (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Powell WB, Ruszczyński A, Topaloglu H (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.Link, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
Ramsay JO (1998) Estimating smooth monotone functions. J. Roy. Statist. Soc.: Ser. B (Statist. Methodology) 60(2):365–375.Crossref, Google Scholar
Ross SM (1983) Introduction to Stochastic Dynamic Programming (Academic Press, New York).Google Scholar
Rust J (1987) Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: J. Econometric Soc. 55(5):999–1033.Crossref, Google Scholar
Salas D, Powell WB (2013) Benchmarking a scalable approximation dynamic programming algorithm for stochastic control of multidimensional energy storage problems. Working paper, Princeton University, Princeton, NJ.Google Scholar
Scott DW (2009) Multivariate Density Estimation: Theory, Practice, and Visualization, Vol. 383 (John Wiley & Sons, New York).Google Scholar
Scott WR, Powell WB, Moazeni S (2012) Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. Working paper, Princeton University, Princeton, NJ.Google Scholar
Secomandi N (2010) Optimal commodity trading with a capacitated storage asset. Management Sci. 56(3):449–467.Link, Google Scholar
Smith JE, McCardle KF (2002) Structural properties of stochastic dynamic programs. Oper. Res. 50(5):796–809.Link, Google Scholar
Stockey N, Lucas Jr RE (1989) Recursive Methods in Economic Dynamics (Harvard University Press, Cambridge, MA).Crossref, Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Topaloglu H, Powell WB (2003) An algorithm for approximating piecewise linear concave functions from sample gradients. Oper. Res. Lett. 31(1):66–76.Crossref, Google Scholar
Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.Crossref, Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar

Volume 63, Issue 6

November-December 2015

Pages ii-iv, 1245-1546

Article Information

Supplemental Material

Metrics

Information

Received:July 01, 2014
Accepted:August 01, 2015
Published Online:November 04, 2015

Cite as

Daniel R. Jiang, Warren B. Powell (2015) An Approximate Dynamic Programming Algorithm for Monotone Value Functions. Operations Research 63(6):1489-1511.

https://doi.org/10.1287/opre.2015.1425

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

References

Volume 63, Issue 6

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News