Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Published Online:https://doi.org/10.1287/moor.2017.0872

References

  • Acciaio B, Penner I (2011) Dynamic risk measures. Di Nunno, G, Øksendal, B, eds. Advanced Mathematical Methods for Finance (Springer, Berlin), 1–34.CrossrefGoogle Scholar
  • Al-Qaq WA, Devetsikiotis M, Townsend JK (1995) Stochastic gradient optimization of importance sampling for the efficient simulation of digital communication systems. IEEE Trans. Comm. 43(12):2975–2985.CrossrefGoogle Scholar
  • Artzner P, Delbaen F, Eber J, Heath D (1999) Coherent measures of risk. Math. Finance 9(3):203–228.CrossrefGoogle Scholar
  • Artzner P, Delbaen F, Eber J, Heath D, Ku H (2006) Coherent multiperiod risk adjusted values and Bellman’s principle. Ann. Oper. Res. 152(1):5–22.CrossrefGoogle Scholar
  • Azar MG, Ghavamzadeh M, Kappen HJ (2011) Reinforcement learning with a near optimal rate of convergence. Technical Report, Radboud University Nijmegen, Netherlands.Google Scholar
  • Bardou O, Frikha N (2009) Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling. Monte Carlo Methods Appl. 15(3):173–210.CrossrefGoogle Scholar
  • Belles-Sampera J, Guillén M, Santolino M (2014) Beyond value-at-risk: GlueVaR distortion risk measures. Risk Anal. 34(1):121–134.CrossrefGoogle Scholar
  • Bertimas D, Lo AW (1998) Optimal control of execution costs. J. Financial Markets 1(1):1–50.CrossrefGoogle Scholar
  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Boda K, Filar JA (2006) Time consistent dynamic risk measures. Math. Methods Oper. Res. 63(1):169–186.CrossrefGoogle Scholar
  • Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Breiman L (1992) Probability (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Bucklew J (2004) Introduction to Rare Event Simulation (Springer, New York).CrossrefGoogle Scholar
  • Byström H (2005) Extreme value theory and extremely large electricity price changes. Internat. Rev. Econom. Finance 14(1):41–55.CrossrefGoogle Scholar
  • Çavus O, Ruszczyński A (2014) Computational methods for risk-averse undiscounted transient Markov models. Oper. Res. 62(2):401–417.LinkGoogle Scholar
  • Cheridito P, Stadje M (2009) Time-inconsistency of VaR and time-consistent alternatives. Finance Res. Lett. 6(1):40–46.CrossrefGoogle Scholar
  • Cheridito P, Delbaen F, Kupper M (2006) Dynamic monetary risk measures for bounded discrete-time processes. Electronic J. Probab. 11(3):57–106.CrossrefGoogle Scholar
  • Chow Y, Ghavamzadeh M (2014) Algorithms for CVaR optimization in MDPs. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 3509–3517.Google Scholar
  • Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.CrossrefGoogle Scholar
  • Danielsson J, Jorgensen B, Samorodnitsky G, Sarma M, de Vries C (2005) Subadditivity reexamined: The case for value-at-risk. Discussion paper, 549. Financial Markets Group, London School of Economics and Political Science, London, UK.Google Scholar
  • Dhaene J, Laeven RJA, Vanduffel S, Darkiewicz G, Goovaerts MJ (2006) Can a coherent risk measure be too subadditive? J. Risk Insurance 75(2):365–386.CrossrefGoogle Scholar
  • Dowd K, Blake D (2006) After VaR: The theory, estimation, and insurance applications of quantile-based risk measures. J. Risk Insurance 73(2):193–229.CrossrefGoogle Scholar
  • Duffie D, Pan J (1997) An overview of value at risk. J. Derivatives 4(3):7–49.CrossrefGoogle Scholar
  • Egloff D, Leippold M (2010) Quantile estimation with adaptive importance sampling. Ann. Statist. 38(2):1244–1278.CrossrefGoogle Scholar
  • Enders J, Powell WB, Egan D (2010) A dynamic model for the failure replacement of aging high-voltage transformers. Energy Systems 1(1):31–59.CrossrefGoogle Scholar
  • Even-Dar E, Mansour Y (2004) Learning rates for Q-learning. J. Machine Learn. Res. 5:1–25.Google Scholar
  • Frittelli M, Gianin ER (2004) Dynamic convex risk measures. Szegö, Risk Measures for the 21st Century (John Wiley & Sons, Chichester, UK), 227–248.Google Scholar
  • George AP, Powell WB (2006) Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learn. 65(1):167–198.CrossrefGoogle Scholar
  • Glasserman P, Liu TW (1996) Rare-event simulation for multistage production-inventory systems. Management Sci. 42(9):1292–1307.LinkGoogle Scholar
  • Ibragimov R, Walden J (2007) The limits of diversification when losses may be large. J. Banking and Finance 31(8):2551–2569.CrossrefGoogle Scholar
  • Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6:1185–1201.CrossrefGoogle Scholar
  • Jiang DR, Powell WB (2015) An approximate dynamic programming algorithm for monotone value functions. Oper. Res. 63(6): 1489–1511.LinkGoogle Scholar
  • Jiang DR, Powell WB (2015) Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming. INFORMS J. Comput. 27(3):525–543.LinkGoogle Scholar
  • Juditsky A, Lan G, Nemirovski A, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • Kan YS (2011) On the convergence of a stochastic approximation procedure for estimating the quantile criterion in the case of a discontinuous distribution function. Automation Remote Control 72(2):283–288.CrossrefGoogle Scholar
  • Kearns MJ, Singh SP (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Advances in Neural Information Processing Systems, 996–1002.Google Scholar
  • Kim JH, Powell WB (2011) An hour-ahead prediction model for heavy-tailed spot prices. Energy Econom. 33(6):1252–1266.CrossrefGoogle Scholar
  • Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optimization 12(2):479–502.CrossrefGoogle Scholar
  • Kozmík V, Morton DP (2015) Evaluating policies in risk-averse multi-stage stochastic programming. Math. Programming 152(1):275–300.CrossrefGoogle Scholar
  • Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications, Vol. 35 (Springer, New York).Google Scholar
  • Löhndorf N, Wozabal D, Minner S (2013) Optimizing trading decisions for hydro storage systems using approximate dual dynamic programming. Oper. Res. 61(4):810–823.LinkGoogle Scholar
  • Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.LinkGoogle Scholar
  • Neise F (2008) Risk Management in Stochastic Integer Programming (Springer Fachmedien, Wiesbaden, Germany).Google Scholar
  • Pereira MVF, Pinto LMVG (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1):359–375.CrossrefGoogle Scholar
  • Pflug GC (1996) Optimization of Stochastic Models (Kluwer Academic Publishers, New York).CrossrefGoogle Scholar
  • Pflug GC, Ruszczyński A (2005) Measuring risk for income streams. Computational Optim. Appl. 32(1–2):161–178.CrossrefGoogle Scholar
  • Philpott AB, de Matos VL (2012) Dynamic sampling algorithms for multi-stage stochastic programs with risk aversion. Eur. J. Oper. Res. 218(2):470–483.CrossrefGoogle Scholar
  • Philpott AB, de Matos VL, Finardi E (2013) On solving multistage stochastic programs with coherent risk measures. Oper. Res. 61(4):957–970.LinkGoogle Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Powell WB, Ruszczyński A, Topaloglu H (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.LinkGoogle Scholar
  • Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Google Scholar
  • Rakhlin A, Shamir O, Sridharan K (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Machine Learn. ICML ’12 (Omnipress, Madison, WI), 449–456.Google Scholar
  • Riedel F (2004) Dynamic coherent risk measures. Stochastic Processes and Their Appl. 112(2):185–200.CrossrefGoogle Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. The Annals of Mathematical Statistics 22(3):400–407.CrossrefGoogle Scholar
  • Robbins H, Siegmund D (1971) A convergence theorem for nonnegative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
  • Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2:21–41.CrossrefGoogle Scholar
  • Rockafellar RT, Uryasev S (2002) Conditional value-at-risk for general loss distributions. J. Banking and Finance 26(7):1443–1471.CrossrefGoogle Scholar
  • Rubinstein RY (1999) The simulated entropy method for combinatorial and continuous optimization. Methodology Comput. Appl. Probab. 1(2):127–190.CrossrefGoogle Scholar
  • Rudloff B, Street A, Valladão DM (2014) Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences. Eur. J. Oper. Res. 234(3):743–750.CrossrefGoogle Scholar
  • Ruszczyński A (2010) Risk-averse dynamic programming for Markov decision processes. Math. Programming 125(2):235–261.CrossrefGoogle Scholar
  • Ruszczyński A, Shapiro A (2006) Conditional risk mappings. Math. Oper. Res. 31(3):544–561.LinkGoogle Scholar
  • Ruszczyński A, Shapiro A (2006) Optimization of convex risk functions. Math. Oper. Res. 31(3):433–452.LinkGoogle Scholar
  • Ryu EK, Boyd SP (2015) Adaptive importance sampling via stochastic convex programming. Preprint arXiv:1412.4845.Google Scholar
  • Ryzhov IO, Powell WB (2011) Information collection on a graph. Oper. Res. 59(1):188–201.LinkGoogle Scholar
  • Ryzhov IO, Frazier PI, Powell WB (2015) A new optimal stepsize for approximate dynamic programming. IEEE Trans. Automatic Control 60(3):743–758.CrossrefGoogle Scholar
  • Schaul T, Zhang S, LeCun Y (2013) No more pesky learning rates. J. Machine Learn. Res. 28(2):343–351.Google Scholar
  • Sereda EN, Bronshtein EM, Rachev ST, Fabozzi FJ, Sun EW, Stoyanov SV (2010) Distortion risk measures in portfolio optimization. Handbook of Portfolio Construction (Springer, New York), 649–673.CrossrefGoogle Scholar
  • Shapiro A (2009) On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37(3):143–147.CrossrefGoogle Scholar
  • Shapiro A, Tekaya W, da Costa JP, Soares MP (2013) Risk neutral and risk averse stochastic dual dynamic programming method. Eur. J. Oper. Res. 224(2):375–391.CrossrefGoogle Scholar
  • Szepesvari C, Littman ML (1996) Generalized Markov decision processes: Dynamic programming and reinforcement learning algorithms. Technical Report, Bolyai Institute of Mathematics, Hungary.Google Scholar
  • Tierney L (1983) A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J. Scientific Statist. Comput. 4(4):706–711.CrossrefGoogle Scholar
  • Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.CrossrefGoogle Scholar
  • Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • Whittle P (1980) Multi-armed bandits and the Gittins Index. J. Roy. Statist. Soc. Ser. B (Methodological) 42(2):143–149.Google Scholar
  • Xi X, Sioshansi R, Marano V (2014) A stochastic dynamic programming model for co-optimization of distributed energy storage. Energy Systems 5(3):475–505.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.