Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation
Published Online:16 Apr 2024https://doi.org/10.1287/moor.2022.0179
References
- [1] (2000) On a perturbation approach for the analysis of stochastic tracking algorithms. SIAM J. Control Optim. 39(3):872–899.Crossref, Google Scholar
- [2] (1997) A monotonicity property of the gamma function. Proc. Amer. Math. Soc. 125(11):3355–3362.Crossref, Google Scholar
- [3] (2013) Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n). Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
- [4] (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, New York).Google Scholar
- [5] (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Nashua, NH).Google Scholar
- [6] (2015) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua, NH).Google Scholar
- [7] (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.Link, Google Scholar
- [8] (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [9] (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
- [10] (2020) Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4173–4183.Google Scholar
- [11] (2020) A tale of two-timescale reinforcement learning with the tightest finite-time bound. Proc. Conf. AAAI Artificial Intelligence, vol. 34 (AAAI Press, Palo Alto, CA), 3701–3708.Crossref, Google Scholar
- [12] (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).Crossref, Google Scholar
- [13] (2023) Rosenthal-type inequalities for linear statistics of Markov chains. Preprint, submitted March 10, https://arxiv.org/abs/2303.05838.Google Scholar
- [14] (2021) On the stability of random matrix product with Markovian noise: Application to linear stochastic approximation and TD learning. Belkin M, Kpotufe S, eds. Proc. Thirty-Fourth Conf. Learn. Theory, vol. 134 (PMLR, New York), 1711–1752.Google Scholar
- [15] (2021) Tight high probability bounds for linear stochastic approximation with fixed stepsize. Ranzato M, Beygelzimer A, Nguyen K, Liang PS, Vaughan JW, Dauphin Y, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
- [16] (1983) Quadratic mean and almost-sure convergence of unbounded stochastic approximation algorithms with correlated observations. Ann. l’Institut Henri Poincare (B) Probab. Statist. 19(3):235–255.Google Scholar
- [17] (1995) Exponential stability of general tracking algorithms. IEEE Trans. Automatic Control 40(8):1376–1387.Crossref, Google Scholar
- [18] (2014) Introduction to Matrix Analysis and Applications (Springer International Publishing, Cham, Switzerland).Crossref, Google Scholar
- [19] (2021) Matrix concentration for products. Foundations Comput. Math. 22(6):1–33.Google Scholar
- [20] (2019) Making the last iterate of sgd information theoretically optimal. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, vol. 99 (PMLR, New York), 1752–1755.Google Scholar
- [21] (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(2018):1–42.Google Scholar
- [22] (2018) Accelerating stochastic gradient descent for least squares regression. Proc. 31st Conf. Learn. Theory (PMLR, New York), 545–604.Google Scholar
- [23] (2010) Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38(6):2418–2442.Crossref, Google Scholar
- [24] (2003) Stochastic Approximation and Recursive Algorithms and Applications, vol. 35 (Springer Science & Business Media, New York).Google Scholar
- [25] (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1347–1355.Google Scholar
- [26] (2022) Bias in stochastic approximation cannot be eliminated with averaging. 2022 58th Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE Press, Piscataway, NJ), 1–4.Google Scholar
- [27] (2002) Recursive identification algorithms. Circuits Systems Signal Processing 21(1):57–68.Crossref, Google Scholar
- [28] (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
- [29] (2020) On linear stochastic approximation: Fine-grained Polyak–Ruppert and non-asymptotic concentration. Proc. Thirty Third Conf. Learn. Theory, vol. 125 (PMLR, New York), 2947–2997.Google Scholar
- [30] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [31] (2012) Sharp Martingale and Semimartingale Inequalities, 1st ed., Monografie Matematyczne 72 (Birkhäuser, Basel, Switzerland).Crossref, Google Scholar
- [32] (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic J. Probab. 20(2015):1–32.Google Scholar
- [33] (1994) Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22(4):1679–1706.Crossref, Google Scholar
- [34] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [35] (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1571–1578.Google Scholar
- [36] (2017) Asymptotic Theory of Weakly Dependent Random Processes, vol. 80 (Springer, Cham, Switzerland).Crossref, Google Scholar
- [37] (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
- [38] (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. Thirty-Second Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
- [39] (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Crossref, Google Scholar
- [40] (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [41] (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
- [42] (1985) Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar

