An Approximate Dynamic Programming Algorithm for Monotone Value Functions
Published Online:4 Nov 2015https://doi.org/10.1287/opre.2015.1425
References
- (2015) Regularized decomposition of high-dimensional multistage stochastic programs with Markov uncertainty. Working paper, Princeton University, Princeton, NJ.Google Scholar
- (1955) An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 26(4):641–647.Crossref, Google Scholar
- (1972) Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression (John Wiley & Sons, New York).Google Scholar
- (2007) Dynamic Programming and Optimal Control, Vol. II, 4 ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.Crossref, Google Scholar
- (1996) Neuro–Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (1985) Decomposition and partitioning methods for multistage stochastic linear programs. Oper. Res. 33(5):989–1007.Link, Google Scholar
- (1992) Probability (SIAM, Philadelphia, PA).Crossref, Google Scholar
- (1955) Maximum likelihood estimates of monotone parameters. Ann. Math. Statist. 26(4):607–616.Crossref, Google Scholar
- (2010) Valuation of energy storage: An optimal switching approach. Quant. Finance 10(4):359–374.Crossref, Google Scholar
- (2006) A simple nonparametric estimator of a strictly monotone regression function. Bernoulli 12(3):469–490.Crossref, Google Scholar
- (2004) Properties of American option prices. Stochastic Processes Appl. 114(2):265–278.Crossref, Google Scholar
- (1974) Towards an economic theory of replacement investment. Econometrica 42(3):393–424.Crossref, Google Scholar
- (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learn. 65(1):167–198.Crossref, Google Scholar
- (2010) Optimal dosing applied to glycemic control of type 2 diabetes. Senior thesis, Princeton University, Princeton, NJ.Google Scholar
- (2015) Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming. INFORMS J. Comput. 27(3):525–543.Link, Google Scholar
- (2014) A model of the consumption response to fiscal stimulus payments. Econometrica 82(4):1199–1239.Crossref, Google Scholar
- (2011) Optimal energy commitments with storage and intermittent supply. Oper. Res. 59(6):1347–1360.Link, Google Scholar
- (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.Crossref, Google Scholar
- (2010) Monotone optimal replacement policies for a Markovian deteriorating system in a controllable environment. Oper. Res. Lett. 38(4):273–279.Crossref, Google Scholar
- (2011) The structure of optimal statin initiation policies for patients with type 2 diabetes. IIE Trans. Healthcare Systems Engrg. 1(1):49–65.Crossref, Google Scholar
- (1998) Investment Science (Oxford University Press, New York).Google Scholar
- (1991) Estimating a smooth monotone regression function. Ann. Statist. 19(2):724–740.Crossref, Google Scholar
- (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
- (2012) Optimizing statin treatment decisions for diabetes patients in the presence of uncertain future adherence. Medical Decision Making 32(1):154–166.Crossref, Google Scholar
- (1970) Economics of information and job search. Quart. J. Econom. 84(1):113–126.Crossref, Google Scholar
- (1988) Monotone nonparametric regression. Ann. Statist. 16(2):741–750.Crossref, Google Scholar
- (1997) How does the value function of a Markov decision process depend on the transition probabilities? Math. Oper. Res. 22(4):872–885.Link, Google Scholar
- (1964) On estimating regression. Theory Probab. Appl. 9(1):141–142.Crossref, Google Scholar
- (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.Link, Google Scholar
- (2010) Dynamic programming models and algorithms for the mutual fund cash balance problem. Management Sci. 56(5):801–815.Link, Google Scholar
- (2002) Kernel-based reinforcement learning. Machine Learn. 49(2–3):161–178.Crossref, Google Scholar
- (2002) Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. Eur. J. Oper. Res. 142(1):108–127.Crossref, Google Scholar
- (2003a) A discrete online monotone estimation algorithm. Working paper LSEOR 03.73, London School of Economics, London.Google Scholar
- (2003b) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res. Logist. 50(7):742–769.Crossref, Google Scholar
- (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1–3):359–375.Crossref, Google Scholar
- (1976) A survey of maintenance models: The control and surveillance of deteriorating systems. Naval Res. Logist. Quart. 23(3):353–388.Crossref, Google Scholar
- (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality 2nd ed. (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.Link, Google Scholar
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
- (1998) Estimating smooth monotone functions. J. Roy. Statist. Soc.: Ser. B (Statist. Methodology) 60(2):365–375.Crossref, Google Scholar
- (1983) Introduction to Stochastic Dynamic Programming (Academic Press, New York).Google Scholar
- (1987) Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica: J. Econometric Soc. 55(5):999–1033.Crossref, Google Scholar
- (2013) Benchmarking a scalable approximation dynamic programming algorithm for stochastic control of multidimensional energy storage problems. Working paper, Princeton University, Princeton, NJ.Google Scholar
- (2009) Multivariate Density Estimation: Theory, Practice, and Visualization, Vol. 383 (John Wiley & Sons, New York).Google Scholar
- (2012) Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. Working paper, Princeton University, Princeton, NJ.Google Scholar
- (2010) Optimal commodity trading with a capacitated storage asset. Management Sci. 56(3):449–467.Link, Google Scholar
- (2002) Structural properties of stochastic dynamic programs. Oper. Res. 50(5):796–809.Link, Google Scholar
- (1989) Recursive Methods in Economic Dynamics (Harvard University Press, Cambridge, MA).Crossref, Google Scholar
- (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2003) An algorithm for approximating piecewise linear concave functions from sample gradients. Oper. Res. Lett. 31(1):66–76.Crossref, Google Scholar
- (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.Crossref, Google Scholar
- (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar

