Parallel Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization
Published Online:12 Apr 2017https://doi.org/10.1287/ijoc.2016.0733
References
- (2002) Analysis of generalized pattern searches. SIAM J. Optim. 13(3):889–903.Crossref, Google Scholar
- (2001) Infinite-horizon policy-gradient estimation. J. Artificial Intelligence Res. 15(1):319–350.Crossref, Google Scholar
- (2005) Dynamic Programming and Optimal Control, Vol. 1 3rd ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2012) Dynamic Programming and Optimal Control, Vol. II, 4th ed., Approximate Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (1989) Parallel and Distributed Computation: Numerical Methods (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar
- (1996) Neuro-Dynamic Programming, 1st ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2012) On the power and limitations of affine policies in two-stage adaptive optimization. Math. Programming 134(2):491–531.Crossref, Google Scholar
- (2010) Optimality of affine policies in multistage robust optimization. Math. Oper. Res. 35(2):363–394.Link, Google Scholar
- (2008) Optimization of pumped storage capacity in an isolated power system with large renewable penetration. IEEE Trans. Power Systems 23(2):523–531.Crossref, Google Scholar
- (2008) Managing Energy Risk: An Integrated View on Power and Other Energy Markets (John Wiley and Sons, Chichester, UK).Google Scholar
- (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators, 1st ed. (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2010) Valuation of energy storage: An optimal switching approach. Quant. Finance 10(4):359–374.Crossref, Google Scholar
- (1998) Parallel Optimization: Theory, Algorithms, and Applications (Oxford University Press, New York).Crossref, Google Scholar
- (1999) Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47(1):38–53.Link, Google Scholar
- (2007) Total risk minimization using Monte Carlo simulations. Birge JR, Linetsky V, eds. Handbooks in Operations Research and Management Science (Elsevier, Amsterdam), 593–635.Google Scholar
- (1999) Reconstructing the unknown local volatility function. J. Comput. Finance 2(3):77–102.Crossref, Google Scholar
- (2008) Risk-aware decision making and dynamic programming. Proc. 22nd Annual Conf. Neural Inform. Processing Systems, Vancouver, Canada, 1–8.Google Scholar
- (1993) Parallel algorithms for global optimization. J. Optim. Theory Appl. 79(2):385–395.Crossref, Google Scholar
- (2003) On the local convergence of pattern search. SIAM J. Optim. 14(2):567–583.Crossref, Google Scholar
- (1997) An overview of value at risk. J. Derivatives 4(3):7–49.Crossref, Google Scholar
- (2015) Curses, tradeoffs, and scalable management: Advancing evolutionary multiobjective direct policy search to improve water reservoir operations. J. Water Resources Planning Management 142(2): Article no. 04015050.Crossref, Google Scholar
- (1993) Scientific Computing: An Introduction with Parallel Computing (Academic Press, San Diego).Crossref, Google Scholar
- (2008) Stochastic joint optimization of wind generation and pumped-storage units in an electricity market. IEEE Trans. Power Systems 23(2):460–468.Crossref, Google Scholar
- (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Systems, Man, Cybernetics—Part C: Appl. Rev. 42(6):1291–1307.Crossref, Google Scholar
- (2015) Optimal management and sizing of energy storage under dynamic pricing for the efficient integration of renewable energy. IEEE Trans. Power Systems 30(3):1164–1181.Crossref, Google Scholar
- (2001) Asynchronous parallel pattern search for nonlinear optimization. SIAM J. Sci. Comput. 23(1):134–156.Crossref, Google Scholar
- (1993) Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41(3):484–500.Link, Google Scholar
- (2001) A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21(4):345–383.Crossref, Google Scholar
- (2002) A natural policy gradient. Dietterich TG, Becker S, Ghahramami Z, eds. Proc. 14th Internat. Conf. Neural Inform. Processing Systems: Natural Synthentic (MIT Press, Cambridge, MA), 1531–1538.Google Scholar
- (2006) Automatic basis function construction for approximate dynamic programming and reinforcement learning. Proc. 23rd Internat. Conf. Machine Learn. (ACM, New York), 449–456.Crossref, Google Scholar
- (2011) Optimal energy commitments with storage and intermittent supply. Oper. Res. 59(6):1347–1360.Link, Google Scholar
- (2009) Policy search for motor primitives in robotics. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inform. Processing Systems 21 (Curran Associates, Inc., Red Hook, NY), 849–856.Google Scholar
- (2005) Revisiting asynchronous parallel pattern search for nonlinear optimization. SIAM J. Optim. 16(2):563–586.Crossref, Google Scholar
- (2012) Direct policy search reinforcement learning based on particle filtering. 10th Eur. Workshop Reinforcement Learn. (EWRL 2012), Edinburgh, UK.Google Scholar
- (2011) Primal and dual linear decision rules in stochastic and robust optimization. Math. Programming 130(1):177–209.Crossref, Google Scholar
- (2010) An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation. Oper. Res. 58(3):564–582.Link, Google Scholar
- (1998) Why pattern search works. Optima (59):1–7.Google Scholar
- (2000) Direct search methods: Then and now. J. Comput. Appl. Math. 124(1):191–207.Crossref, Google Scholar
- (2009) Sustainable Energy-Without the Hot Air, 1st ed. (UIT Cambridge Ltd., Cambridge, UK).Google Scholar
- (2003) The cross entropy method for fast policy search. Proc. 20th Internat. Conf. Machine Learn. (ICML-2003), Washington, DC, 512–519.Google Scholar
- (2016) Smoothing and parametric rules for stochastic mean-CVAR optimal execution strategy. Ann. Oper. Res. 237(1):99–120.Crossref, Google Scholar
- (2015) Mean-conditional value-at-risk optimal energy storage operation in the presence of transaction costs. IEEE Trans. Power Systems 30(3):1222–1232.Crossref, Google Scholar
- (2003) Error bounds for approximate policy iteration. Proc. 20th Internat. Conf. Machine Learn. (ICML-2003), Washington, DC, 560–567.Google Scholar
- (2000) PEGASUS: A policy search method for large MDPs and POMDPs. Proc. 16th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco),406–415.Google Scholar
- (2006) Numerical Optimization, 2nd ed. (Springer, New York).Google Scholar
- (2005) Natural actor-critic. Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L, eds. Machine Learning: ECML 2005: 16th Eur. Conf. Maching Learn., Lecture Notes in Computer Science, Vol. 3720 (Springer, Berlin), 280–291.Crossref, Google Scholar
- (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (John Wiley & Sons, New York).Crossref, Google Scholar
- (2004) Dynamic coherent risk measures. Stochastic Processes Their Appl. 112(2):185–200.Crossref, Google Scholar
- (2013) Derivative-free optimization: A review of algorithms and comparison of software implementations. J. Global Optim. 56(3):1247–1293.Crossref, Google Scholar
- (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–41.Crossref, Google Scholar
- (2014) Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences. Eur. J. Oper. Res. 234(3):743–750.Crossref, Google Scholar
- (2011) The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J. Optim. 21(3):996–1026.Crossref, Google Scholar
- (2014) Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage. http://arxiv.org/pdf/1401.0843v1.pdf.Google Scholar
- (2009) Nested parallelism for multi-core HPC systems using java. J. Parallel Distributed Comput. 69(6):532–545.Crossref, Google Scholar
- (2012) Time consistency of dynamic risk measures. Oper. Res. Lett. 40(6):436–439.Crossref, Google Scholar
- (2014) Deterministic policy gradient algorithms. Proc. 31st Internat. Conf. Machine Learn., Beijing, I-387–I-395.Google Scholar
- (2003) Policy search using paired comparisons. J. Machine Learn. Res. 3(3):921–950.Google Scholar
- (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2000) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, Vol. 12 (MIT Press, Cambridge, MA), 1057–1063.Google Scholar
- (1997) On the convergence of pattern search algorithms. SIAM J. Optim. 7(1):1–25.Crossref, Google Scholar
- (1996) Feature-based methods for large scale dynamic programming. Machine Learn. 22(1):59–94.Crossref, Google Scholar
- (1996) A block-parallel conjugate gradient method for separable quadratic programming problems. J. Oper. Res. Soc. Japan 39(3):407–427.Crossref, Google Scholar
- (2009) Basis function adaptation methods for cost approximation in MDP. IEEE Sympos. Adaptive Dynam. Programming Reinforcement Learn., Nashville, TN, 74–81.Crossref, Google Scholar
- (2010) Error bounds for approximations from projected linear equations. Math. Oper. Res. 35(2):306–329.Link, Google Scholar
- (2014) Managing wind-based electricity generation in the presence of storage and transmission capacity. Tepper Working Paper 2011-E36 1–38, Carnegie Mellon University, Pittsburgh, PA.Google Scholar

