Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Daniel R. Jiang
Corresponding Author
Daniel R. Jiang
[email protected]
http://orcid.org/0000-0002-5388-8061
Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261
Search for more papers by this author
,
Warren B. Powell
Warren B. Powell
[email protected]
http://orcid.org/0000-0002-4364-7555
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Daniel R. Jiang

Corresponding Author

Daniel R. Jiang

[email protected]

http://orcid.org/0000-0002-5388-8061

Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261

Search for more papers by this author

Warren B. Powell

[email protected]

http://orcid.org/0000-0002-4364-7555

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:13 Nov 2017https://doi.org/10.1287/moor.2017.0872

References

Acciaio B, Penner I (2011) Dynamic risk measures. Di Nunno, G, Øksendal, B, eds. Advanced Mathematical Methods for Finance (Springer, Berlin), 1–34.Crossref, Google Scholar
Al-Qaq WA, Devetsikiotis M, Townsend JK (1995) Stochastic gradient optimization of importance sampling for the efficient simulation of digital communication systems. IEEE Trans. Comm. 43(12):2975–2985.Crossref, Google Scholar
Artzner P, Delbaen F, Eber J, Heath D (1999) Coherent measures of risk. Math. Finance 9(3):203–228.Crossref, Google Scholar
Artzner P, Delbaen F, Eber J, Heath D, Ku H (2006) Coherent multiperiod risk adjusted values and Bellman’s principle. Ann. Oper. Res. 152(1):5–22.Crossref, Google Scholar
Azar MG, Ghavamzadeh M, Kappen HJ (2011) Reinforcement learning with a near optimal rate of convergence. Technical Report, Radboud University Nijmegen, Netherlands.Google Scholar
Bardou O, Frikha N (2009) Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling. Monte Carlo Methods Appl. 15(3):173–210.Crossref, Google Scholar
Belles-Sampera J, Guillén M, Santolino M (2014) Beyond value-at-risk: GlueVaR distortion risk measures. Risk Anal. 34(1):121–134.Crossref, Google Scholar
Bertimas D, Lo AW (1998) Optimal control of execution costs. J. Financial Markets 1(1):1–50.Crossref, Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
Boda K, Filar JA (2006) Time consistent dynamic risk measures. Math. Methods Oper. Res. 63(1):169–186.Crossref, Google Scholar
Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Breiman L (1992) Probability (SIAM, Philadelphia).Crossref, Google Scholar
Bucklew J (2004) Introduction to Rare Event Simulation (Springer, New York).Crossref, Google Scholar
Byström H (2005) Extreme value theory and extremely large electricity price changes. Internat. Rev. Econom. Finance 14(1):41–55.Crossref, Google Scholar
Çavus O, Ruszczyński A (2014) Computational methods for risk-averse undiscounted transient Markov models. Oper. Res. 62(2):401–417.Link, Google Scholar
Cheridito P, Stadje M (2009) Time-inconsistency of VaR and time-consistent alternatives. Finance Res. Lett. 6(1):40–46.Crossref, Google Scholar
Cheridito P, Delbaen F, Kupper M (2006) Dynamic monetary risk measures for bounded discrete-time processes. Electronic J. Probab. 11(3):57–106.Crossref, Google Scholar
Chow Y, Ghavamzadeh M (2014) Algorithms for CVaR optimization in MDPs. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 3509–3517.Google Scholar
Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
Danielsson J, Jorgensen B, Samorodnitsky G, Sarma M, de Vries C (2005) Subadditivity reexamined: The case for value-at-risk. Discussion paper, 549. Financial Markets Group, London School of Economics and Political Science, London, UK.Google Scholar
Dhaene J, Laeven RJA, Vanduffel S, Darkiewicz G, Goovaerts MJ (2006) Can a coherent risk measure be too subadditive? J. Risk Insurance 75(2):365–386.Crossref, Google Scholar
Dowd K, Blake D (2006) After VaR: The theory, estimation, and insurance applications of quantile-based risk measures. J. Risk Insurance 73(2):193–229.Crossref, Google Scholar
Duffie D, Pan J (1997) An overview of value at risk. J. Derivatives 4(3):7–49.Crossref, Google Scholar
Egloff D, Leippold M (2010) Quantile estimation with adaptive importance sampling. Ann. Statist. 38(2):1244–1278.Crossref, Google Scholar
Enders J, Powell WB, Egan D (2010) A dynamic model for the failure replacement of aging high-voltage transformers. Energy Systems 1(1):31–59.Crossref, Google Scholar
Even-Dar E, Mansour Y (2004) Learning rates for Q-learning. J. Machine Learn. Res. 5:1–25.Google Scholar
Frittelli M, Gianin ER (2004) Dynamic convex risk measures. Szegö, Risk Measures for the 21st Century (John Wiley & Sons, Chichester, UK), 227–248.Google Scholar
George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learn. 65(1):167–198.Crossref, Google Scholar
Glasserman P, Liu TW (1996) Rare-event simulation for multistage production-inventory systems. Management Sci. 42(9):1292–1307.Link, Google Scholar
Ibragimov R, Walden J (2007) The limits of diversification when losses may be large. J. Banking and Finance 31(8):2551–2569.Crossref, Google Scholar
Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6:1185–1201.Crossref, Google Scholar
Jiang DR, Powell WB (2015) An approximate dynamic programming algorithm for monotone value functions. Oper. Res. 63(6): 1489–1511.Link, Google Scholar
Jiang DR, Powell WB (2015) Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming. INFORMS J. Comput. 27(3):525–543.Link, Google Scholar
Juditsky A, Lan G, Nemirovski A, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
Kan YS (2011) On the convergence of a stochastic approximation procedure for estimating the quantile criterion in the case of a discontinuous distribution function. Automation Remote Control 72(2):283–288.Crossref, Google Scholar
Kearns MJ, Singh SP (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Advances in Neural Information Processing Systems, 996–1002.Google Scholar
Kim JH, Powell WB (2011) An hour-ahead prediction model for heavy-tailed spot prices. Energy Econom. 33(6):1252–1266.Crossref, Google Scholar
Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optimization 12(2):479–502.Crossref, Google Scholar
Kozmík V, Morton DP (2015) Evaluating policies in risk-averse multi-stage stochastic programming. Math. Programming 152(1):275–300.Crossref, Google Scholar
Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications, Vol. 35 (Springer, New York).Google Scholar
Löhndorf N, Wozabal D, Minner S (2013) Optimizing trading decisions for hydro storage systems using approximate dual dynamic programming. Oper. Res. 61(4):810–823.Link, Google Scholar
Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.Link, Google Scholar
Neise F (2008) Risk Management in Stochastic Integer Programming (Springer Fachmedien, Wiesbaden, Germany).Google Scholar
Pereira MVF, Pinto LMVG (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1):359–375.Crossref, Google Scholar
Pflug GC (1996) Optimization of Stochastic Models (Kluwer Academic Publishers, New York).Crossref, Google Scholar
Pflug GC, Ruszczyński A (2005) Measuring risk for income streams. Computational Optim. Appl. 32(1–2):161–178.Crossref, Google Scholar
Philpott AB, de Matos VL (2012) Dynamic sampling algorithms for multi-stage stochastic programs with risk aversion. Eur. J. Oper. Res. 218(2):470–483.Crossref, Google Scholar
Philpott AB, de Matos VL, Finardi E (2013) On solving multistage stochastic programs with coherent risk measures. Oper. Res. 61(4):957–970.Link, Google Scholar
Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (John Wiley & Sons, New York).Crossref, Google Scholar
Powell WB, Ruszczyński A, Topaloglu H (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.Link, Google Scholar
Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Google Scholar
Rakhlin A, Shamir O, Sridharan K (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Machine Learn. ICML ’12 (Omnipress, Madison, WI), 449–456.Google Scholar
Riedel F (2004) Dynamic coherent risk measures. Stochastic Processes and Their Appl. 112(2):185–200.Crossref, Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. The Annals of Mathematical Statistics 22(3):400–407.Crossref, Google Scholar
Robbins H, Siegmund D (1971) A convergence theorem for nonnegative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2:21–41.Crossref, Google Scholar
Rockafellar RT, Uryasev S (2002) Conditional value-at-risk for general loss distributions. J. Banking and Finance 26(7):1443–1471.Crossref, Google Scholar
Rubinstein RY (1999) The simulated entropy method for combinatorial and continuous optimization. Methodology Comput. Appl. Probab. 1(2):127–190.Crossref, Google Scholar
Rudloff B, Street A, Valladão DM (2014) Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences. Eur. J. Oper. Res. 234(3):743–750.Crossref, Google Scholar
Ruszczyński A (2010) Risk-averse dynamic programming for Markov decision processes. Math. Programming 125(2):235–261.Crossref, Google Scholar
Ruszczyński A, Shapiro A (2006) Conditional risk mappings. Math. Oper. Res. 31(3):544–561.Link, Google Scholar
Ruszczyński A, Shapiro A (2006) Optimization of convex risk functions. Math. Oper. Res. 31(3):433–452.Link, Google Scholar
Ryu EK, Boyd SP (2015) Adaptive importance sampling via stochastic convex programming. Preprint arXiv:1412.4845.Google Scholar
Ryzhov IO, Powell WB (2011) Information collection on a graph. Oper. Res. 59(1):188–201.Link, Google Scholar
Ryzhov IO, Frazier PI, Powell WB (2015) A new optimal stepsize for approximate dynamic programming. IEEE Trans. Automatic Control 60(3):743–758.Crossref, Google Scholar
Schaul T, Zhang S, LeCun Y (2013) No more pesky learning rates. J. Machine Learn. Res. 28(2):343–351.Google Scholar
Sereda EN, Bronshtein EM, Rachev ST, Fabozzi FJ, Sun EW, Stoyanov SV (2010) Distortion risk measures in portfolio optimization. Handbook of Portfolio Construction (Springer, New York), 649–673.Crossref, Google Scholar
Shapiro A (2009) On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37(3):143–147.Crossref, Google Scholar
Shapiro A, Tekaya W, da Costa JP, Soares MP (2013) Risk neutral and risk averse stochastic dual dynamic programming method. Eur. J. Oper. Res. 224(2):375–391.Crossref, Google Scholar
Szepesvari C, Littman ML (1996) Generalized Markov decision processes: Dynamic programming and reinforcement learning algorithms. Technical Report, Bolyai Institute of Mathematics, Hungary.Google Scholar
Tierney L (1983) A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J. Scientific Statist. Comput. 4(4):706–711.Crossref, Google Scholar
Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.Crossref, Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
Whittle P (1980) Multi-armed bandits and the Gittins Index. J. Roy. Statist. Soc. Ser. B (Methodological) 42(2):143–149.Google Scholar
Xi X, Sioshansi R, Marano V (2014) A stochastic dynamic programming model for co-optimization of distributed energy storage. Energy Systems 5(3):475–505.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 43, Issue 2

May 2018

Pages 347-692, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 04, 2015
Accepted:May 04, 2017
Published Online:November 13, 2017

Cite as

Daniel R. Jiang, Warren B. Powell (2017) Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures. Mathematics of Operations Research 43(2):554-579.

https://doi.org/10.1287/moor.2017.0872

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

References

Volume 43, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News