A First-Order Approach to Accelerated Value Iteration
References
- (2019) The operator approach to entropy games. Theory Comput. Systems 63(5):1089–1130.Crossref, Google Scholar
- (1995) On the generation of Markov decision processes. J. Oper. Res. Soc. 46(3):354–361.Crossref, Google Scholar
- (2017) An alternative softmax operator for reinforcement learning. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR), 243–252.Google Scholar
- (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1):183–202.Crossref, Google Scholar
- (2005) Computationally efficient approximations of the joint spectral radius. SIAM J. Matrix Anal. Appl. 27(1):256–272.Crossref, Google Scholar
- Boucherie RJ, Van Dijk NM, eds. (2017) Markov Decision Processes in Practice (Springer, Cham, Switzerland).Crossref, Google Scholar
- (2013) Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, vol. 31 (Springer Science & Business Media, New York).Google Scholar
- (2016) OpenAI gym. Preprint, submitted June 5, https://arxiv.org/abs/1606.01540.Google Scholar
- (2013) Reversible Markov decision processes with an average-reward criterion. SIAM J. Control Optim. 51(1):402–418.Crossref, Google Scholar
- (2015) Markov Decision Process (MDP) toolbox for Python. Accessed January 4, 2022, https://github.com/sawcordwell/pymdptoolbox.Google Scholar
- (2020) The greedy strategy for optimizing the Perron eigenvalue. Math. Programming, 10.1007/s10107-020-01585-z.Crossref, Google Scholar
- (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
- (1991) On the algorithm of Pollatschek and Avi-ltzhak. Raghaven TES, Ferguson TS, Parthasarathy T, Vrieze OJ, eds. Stochastic Games and Related Topics: In Honor of Professor L. S. Shapley (Springer, Cham, Switzerland), 59–70.Crossref, Google Scholar
- (2018) Anderson acceleration for reinforcement learning. Preprint, submitted September 25, https://arxiv.org/abs/1809.09501.Google Scholar
- (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE), 310–315.Google Scholar
- (2018) Data uncertainty in Markov chains: Application to cost-effectiveness analyses of medical innovations. Oper. Res. 66(3):697–715.Link, Google Scholar
- (2014) Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3):1588–1623.Crossref, Google Scholar
- (2018) Robust Markov decision process: Beyond rectangularity. Preprint, submitted November 6, https://arxiv.org/abs/1811.00215v3.Google Scholar
- (2021) From convex optimization to MDPs: A review of first-order, second-order and quasi-Newton methods for MDPs. Preprint, submitted November 25, https://arxiv.org/abs/2104.10677.Google Scholar
- (2020) Robust policies for proactive ICU transfers. Preprint, submitted February 14, https://arxiv.org/abs/2002.06247v1.Google Scholar
- (1996) A k-step look-ahead analysis of value iteration algorithms for Markov decision processes. Eur. J. Oper. Res. 88(3):622–636.Crossref, Google Scholar
- (2018) Fast Bellman updates for robust MDPs. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR), 1979–1988.Google Scholar
- (2019) A generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Software 34(2):383–405.Crossref, Google Scholar
- (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
- (2019) Exploration bonus for regret minimization in discrete and continuous average-reward MDPs. Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc.), 4890–4899.Google Scholar
- (2009) The Joint Spectral Radius: Theory and Applications, vol. 385 (Springer Science & Business Media, New York).Crossref, Google Scholar
- (1955) Two observations about the method of successive approximations. Uspekhi Mat. Nauk 10:123–127.Google Scholar
- (1971) Accelerated procedures for the solution of discrete Markov control problems. IEEE Trans. Automatic Control 16(2):147–152.Crossref, Google Scholar
- (2013) Introduction to Smooth Manifolds (Springer, Cham, Switzerland).Google Scholar
- (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Dokl. Akad. Nauk SSSR 269:543–547.Google Scholar
- (2013) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer Science & Business Media, New York).Google Scholar
- (2015) Adaptive restart for accelerated gradient schemes. Foundations Comput. Math. 15(3):715–732.Crossref, Google Scholar
- (2016) Combining policy gradient and Q-learning. arXiv preprint arXiv: 1611.01626.Google Scholar
- (2010) Optimization-Based Approximate Dynamic Programming (University of Massachusetts, Amherst, MA).Google Scholar
- (2016) Difference of convex functions programming applied to control with expert data. Preprint, submitted November 5, https://arxiv.org/abs/1611.01626v1.Google Scholar
- (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- (1978) Accelerated computation of the expected discounted return in a Markov chain. Oper. Res. 26(2):350–358.Link, Google Scholar
- (1997) Application of stochastic dynamic programming to optimal fire management of a spatially structured threatened species. Proc. Internat. Congress Model. Simulation, MODSIM, vol. 2, 813–817.Google Scholar
- (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Google Scholar
- (1979) On the convergence of policy iteration in stationary dynamic programming. Math. Oper. Res. 4(1):60–69.Link, Google Scholar
- (2016) Improved and generalized upper bounds on the complexity of policy iteration. Math. Oper. Res. 41(3):758–774.Link, Google Scholar
- (2015) Approximate modified policy iteration and its application to the game of Tetris. J. Machine Learn. Res. 16(49):1629–1676.Google Scholar
- (2010) Acceleration operators in the value iteration algorithms for Markov decision processes. Oper. Res. 58(1):193–202.Link, Google Scholar
- (2018) Variance reduced value iteration and faster algorithms for solving Markov decision processes. Proc. Twenty-Ninth Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM), 770–787.Google Scholar
- (2021) Taylor expansion of discount factors. Preprint, submitted June 14, https://arxiv.org/abs/2106.06170.Google Scholar
- (2019) Active exploration in Markov decision processes. Chaudhuri K, Masashi S, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 89 (PMLR), 974–982.Google Scholar
- (1991) Random walks and the effective resistance of networks. J. Theoret. Probab. 4(1):101–109.Crossref, Google Scholar
- (2018) Temporal regularization for Markov decision process. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc.), 1779–1789.Google Scholar
- (1997) The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate. Math. Control Signals Systems 10(1):31–40.Crossref, Google Scholar
- (1945) A Lipschitz condition preserving extension for a vector function. Amer. J. Math. 67(1):83–93.Crossref, Google Scholar
- (2019) On connections between constrained optimization and reinforcement learning. Adv. Neural Inform. Processing Systems 2019—Optim. Foundations Reinforcement Learn. Workshop.Google Scholar
- (2019) Stochastic approximation with cone-contractive operators: Sharp L∞-bounds for Q-learning. Preprint, submitted June 24, https://arxiv.org/abs/1905.06265.Google Scholar
- (2013) Robust Markov decision processes. Oper. Res. 38(1):153–183.Abstract, Google Scholar
- (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar
- (2020) Globally convergent type-I Anderson acceleration for nonsmooth fixed-point iterations. SIAM J. Optim. 30(4):3170–3197.Crossref, Google Scholar

