Parallel Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization

Published Online:https://doi.org/10.1287/ijoc.2016.0733

References

  • Audet C, Dennis JEJ (2002) Analysis of generalized pattern searches. SIAM J. Optim. 13(3):889–903.CrossrefGoogle Scholar
  • Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J. Artificial Intelligence Res. 15(1):319–350.CrossrefGoogle Scholar
  • Bertsekas DP (2005) Dynamic Programming and Optimal Control, Vol. 1 3rd ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP (2012) Dynamic Programming and Optimal Control, Vol. II, 4th ed., Approximate Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas D, Tsitsiklis (1989) Parallel and Distributed Computation: Numerical Methods (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar
  • Bertsekas DP, Tsitsiklis J (1996) Neuro-Dynamic Programming, 1st ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Goyal V (2012) On the power and limitations of affine policies in two-stage adaptive optimization. Math. Programming 134(2):491–531.CrossrefGoogle Scholar
  • Bertsimas D, Iancu DA, Parrilo PA (2010) Optimality of affine policies in multistage robust optimization. Math. Oper. Res. 35(2):363–394.LinkGoogle Scholar
  • Brown PD, Lopes JAP, Matos MA (2008) Optimization of pumped storage capacity in an isolated power system with large renewable penetration. IEEE Trans. Power Systems 23(2):523–531.CrossrefGoogle Scholar
  • Burger M, Graeber B, Schindlmayr G (2008) Managing Energy Risk: An Integrated View on Power and Other Energy Markets (John Wiley and Sons, Chichester, UK).Google Scholar
  • Busoniu L, Babuska R, Schutter BD, Ernst D (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators, 1st ed. (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Carmona R, Ludkovski M (2010) Valuation of energy storage: An optimal switching approach. Quant. Finance 10(4):359–374.CrossrefGoogle Scholar
  • Censor Y, Zenios SA (1998) Parallel Optimization: Theory, Algorithms, and Applications (Oxford University Press, New York).CrossrefGoogle Scholar
  • Chen VCP, Ruppert D, Shoemaker CA (1999) Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47(1):38–53.LinkGoogle Scholar
  • Coleman TF, Li Y, Patron C (2007) Total risk minimization using Monte Carlo simulations. Birge JR, Linetsky V, eds. Handbooks in Operations Research and Management Science (Elsevier, Amsterdam), 593–635.Google Scholar
  • Coleman TF, Li Y, Verma A (1999) Reconstructing the unknown local volatility function. J. Comput. Finance 2(3):77–102.CrossrefGoogle Scholar
  • Defourny B, Ernst D, Wehenkel L (2008) Risk-aware decision making and dynamic programming. Proc. 22nd Annual Conf. Neural Inform. Processing Systems, Vancouver, Canada, 1–8.Google Scholar
  • Dixon LCW, Jha M (1993) Parallel algorithms for global optimization. J. Optim. Theory Appl. 79(2):385–395.CrossrefGoogle Scholar
  • Dolan ED, Lewis RM, Torczon V (2003) On the local convergence of pattern search. SIAM J. Optim. 14(2):567–583.CrossrefGoogle Scholar
  • Duffie D, Pan J (1997) An overview of value at risk. J. Derivatives 4(3):7–49.CrossrefGoogle Scholar
  • Giuliani M, Castelletti A, Pianosi F, Mason E, Reed P (2015) Curses, tradeoffs, and scalable management: Advancing evolutionary multiobjective direct policy search to improve water reservoir operations. J. Water Resources Planning Management 142(2): Article no. 04015050.CrossrefGoogle Scholar
  • Golub G, Ortega JM (1993) Scientific Computing: An Introduction with Parallel Computing (Academic Press, San Diego).CrossrefGoogle Scholar
  • Gonzalez JG, de la Muela RMR, Santos LM, Gonzalez AM (2008) Stochastic joint optimization of wind generation and pumped-storage units in an electricity market. IEEE Trans. Power Systems 23(2):460–468.CrossrefGoogle Scholar
  • Grondman I, Busoniu L, Lopes GAD, Babuska R (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Systems, Man, Cybernetics—Part C: Appl. Rev. 42(6):1291–1307.CrossrefGoogle Scholar
  • Harsha P, Dahleh M (2015) Optimal management and sizing of energy storage under dynamic pricing for the efficient integration of renewable energy. IEEE Trans. Power Systems 30(3):1164–1181.CrossrefGoogle Scholar
  • Hough PD, Kolda TG, Torczon VJ (2001) Asynchronous parallel pattern search for nonlinear optimization. SIAM J. Sci. Comput. 23(1):134–156.CrossrefGoogle Scholar
  • Johnson SA, Stedinger JR, Shoemaker CA, Li Y, Tejada-Guibert JA (1993) Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41(3):484–500.LinkGoogle Scholar
  • Jones DR (2001) A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21(4):345–383.CrossrefGoogle Scholar
  • Kakade SM (2002) A natural policy gradient. Dietterich TG, Becker S, Ghahramami Z, eds. Proc. 14th Internat. Conf. Neural Inform. Processing Systems: Natural Synthentic (MIT Press, Cambridge, MA), 1531–1538.Google Scholar
  • Keller PW, Mannor S, Precup D (2006) Automatic basis function construction for approximate dynamic programming and reinforcement learning. Proc. 23rd Internat. Conf. Machine Learn. (ACM, New York), 449–456.CrossrefGoogle Scholar
  • Kim JH, Powell WB (2011) Optimal energy commitments with storage and intermittent supply. Oper. Res. 59(6):1347–1360.LinkGoogle Scholar
  • Kober JR, Peters JR (2009) Policy search for motor primitives in robotics. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inform. Processing Systems 21 (Curran Associates, Inc., Red Hook, NY), 849–856.Google Scholar
  • Kolda TG (2005) Revisiting asynchronous parallel pattern search for nonlinear optimization. SIAM J. Optim. 16(2):563–586.CrossrefGoogle Scholar
  • Kormushev P, Caldwell DG (2012) Direct policy search reinforcement learning based on particle filtering. 10th Eur. Workshop Reinforcement Learn. (EWRL 2012), Edinburgh, UK.Google Scholar
  • Kuhn D, Wiesemann W, Georghiou A (2011) Primal and dual linear decision rules in stochastic and robust optimization. Math. Programming 130(1):177–209.CrossrefGoogle Scholar
  • Lai G, Margot F, Secomandi N (2010) An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation. Oper. Res. 58(3):564–582.LinkGoogle Scholar
  • Lewis RM, Torczon V, Trosset MW (1998) Why pattern search works. Optima (59):1–7.Google Scholar
  • Lewis RM, Torczon V, Trosset MW (2000) Direct search methods: Then and now. J. Comput. Appl. Math. 124(1):191–207.CrossrefGoogle Scholar
  • MacKay DJ (2009) Sustainable Energy-Without the Hot Air, 1st ed. (UIT Cambridge Ltd., Cambridge, UK).Google Scholar
  • Mannor S, Rubinstein R, Gat Y (2003) The cross entropy method for fast policy search. Proc. 20th Internat. Conf. Machine Learn. (ICML-2003), Washington, DC, 512–519.Google Scholar
  • Moazeni S, Coleman TF, Li Y (2016) Smoothing and parametric rules for stochastic mean-CVAR optimal execution strategy. Ann. Oper. Res. 237(1):99–120.CrossrefGoogle Scholar
  • Moazeni S, Powell WB, Hajimiragha AH (2015) Mean-conditional value-at-risk optimal energy storage operation in the presence of transaction costs. IEEE Trans. Power Systems 30(3):1222–1232.CrossrefGoogle Scholar
  • Munos R (2003) Error bounds for approximate policy iteration. Proc. 20th Internat. Conf. Machine Learn. (ICML-2003), Washington, DC, 560–567.Google Scholar
  • Ng AY, Jordan M (2000) PEGASUS: A policy search method for large MDPs and POMDPs. Proc. 16th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco),406–415.Google Scholar
  • Nocedal J, Wright S (2006) Numerical Optimization, 2nd ed. (Springer, New York).Google Scholar
  • Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L, eds. Machine Learning: ECML 2005: 16th Eur. Conf. Maching Learn., Lecture Notes in Computer Science, Vol. 3720 (Springer, Berlin), 280–291.CrossrefGoogle Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Riedel F (2004) Dynamic coherent risk measures. Stochastic Processes Their Appl. 112(2):185–200.CrossrefGoogle Scholar
  • Rios LM, Sahinidis NV (2013) Derivative-free optimization: A review of algorithms and comparison of software implementations. J. Global Optim. 56(3):1247–1293.CrossrefGoogle Scholar
  • Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–41.CrossrefGoogle Scholar
  • Rudloff B, Street A, Valladao DM (2014) Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences. Eur. J. Oper. Res. 234(3):743–750.CrossrefGoogle Scholar
  • Scott WR, Frazier P, Powell WB (2011) The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J. Optim. 21(3):996–1026.CrossrefGoogle Scholar
  • Scott WR, Powell WB, Moazeni S (2014) Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. http://arxiv.org/pdf/1401.0843v1.pdf.Google Scholar
  • Shafi A, Carpenter B, Baker M (2009) Nested parallelism for multi-core HPC systems using java. J. Parallel Distributed Comput. 69(6):532–545.CrossrefGoogle Scholar
  • Shapiro A (2012) Time consistency of dynamic risk measures. Oper. Res. Lett. 40(6):436–439.CrossrefGoogle Scholar
  • Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. Proc. 31st Internat. Conf. Machine Learn., Beijing, I-387–I-395.Google Scholar
  • Strens MJA, Moore AW (2003) Policy search using paired comparisons. J. Machine Learn. Res. 3(3):921–950.Google Scholar
  • Sutton R, Barto A (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, Vol. 12 (MIT Press, Cambridge, MA), 1057–1063.Google Scholar
  • Torczon V (1997) On the convergence of pattern search algorithms. SIAM J. Optim. 7(1):1–25.CrossrefGoogle Scholar
  • Tsitsiklis JN, Roy BV (1996) Feature-based methods for large scale dynamic programming. Machine Learn. 22(1):59–94.CrossrefGoogle Scholar
  • Yamakawa E, Fukushima M (1996) A block-parallel conjugate gradient method for separable quadratic programming problems. J. Oper. Res. Soc. Japan 39(3):407–427.CrossrefGoogle Scholar
  • Yu H, Bertsekas DP (2009) Basis function adaptation methods for cost approximation in MDP. IEEE Sympos. Adaptive Dynam. Programming Reinforcement Learn., Nashville, TN, 74–81.CrossrefGoogle Scholar
  • Yu H, Bertsekas DP (2010) Error bounds for approximations from projected linear equations. Math. Oper. Res. 35(2):306–329.LinkGoogle Scholar
  • Zhou Y, Scheller-Wolf A, Secomandi N, Smith S (2014) Managing wind-based electricity generation in the presence of storage and transmission capacity. Tepper Working Paper 2011-E36 1–38, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.