Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation

Published Online:https://doi.org/10.1287/moor.2022.0179

References

  • [1] Aguech R, Moulines E, Priouret P (2000) On a perturbation approach for the analysis of stochastic tracking algorithms. SIAM J. Control Optim. 39(3):872–899.CrossrefGoogle Scholar
  • [2] Anderson GD, Qiu SL (1997) A monotonicity property of the gamma function. Proc. Amer. Math. Soc. 125(11):3355–3362.CrossrefGoogle Scholar
  • [3] Bach F, Moulines E (2013) Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n). Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
  • [4] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, New York).Google Scholar
  • [5] Bertsekas D (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Nashua, NH).Google Scholar
  • [6] Bertsekas D, Tsitsiklis J (2015) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua, NH).Google Scholar
  • [7] Bhandari J, Russo D, Singal R (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.LinkGoogle Scholar
  • [8] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [9] Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.CrossrefGoogle Scholar
  • [10] Chen S, Devraj A, Busic A, Meyn S (2020) Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4173–4183.Google Scholar
  • [11] Dalal G, Szorenyi B, Thoppe G (2020) A tale of two-timescale reinforcement learning with the tightest finite-time bound. Proc. Conf. AAAI Artificial Intelligence, vol. 34 (AAAI Press, Palo Alto, CA), 3701–3708.CrossrefGoogle Scholar
  • [12] Douc R, Moulines E, Priouret P, Soulier P (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [13] Durmus A, Moulines E, Naumov A, Samsonov S, Sheshukova M (2023) Rosenthal-type inequalities for linear statistics of Markov chains. Preprint, submitted March 10, https://arxiv.org/abs/2303.05838.Google Scholar
  • [14] Durmus A, Moulines E, Naumov A, Samsonov S, Wai HT (2021) On the stability of random matrix product with Markovian noise: Application to linear stochastic approximation and TD learning. Belkin M, Kpotufe S, eds. Proc. Thirty-Fourth Conf. Learn. Theory, vol. 134 (PMLR, New York), 1711–1752.Google Scholar
  • [15] Durmus A, Moulines E, Naumov A, Samsonov S, Scaman K, Wai HT (2021) Tight high probability bounds for linear stochastic approximation with fixed stepsize. Ranzato M, Beygelzimer A, Nguyen K, Liang PS, Vaughan JW, Dauphin Y, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
  • [16] Eweda E, Macchi O (1983) Quadratic mean and almost-sure convergence of unbounded stochastic approximation algorithms with correlated observations. Ann. l’Institut Henri Poincare (B) Probab. Statist. 19(3):235–255.Google Scholar
  • [17] Guo L, Ljung L (1995) Exponential stability of general tracking algorithms. IEEE Trans. Automatic Control 40(8):1376–1387.CrossrefGoogle Scholar
  • [18] Hiai F, Petz D (2014) Introduction to Matrix Analysis and Applications (Springer International Publishing, Cham, Switzerland).CrossrefGoogle Scholar
  • [19] Huang D, Niles-Weed J, Tropp JA, Ward R (2021) Matrix concentration for products. Foundations Comput. Math. 22(6):1–33.Google Scholar
  • [20] Jain P, Nagaraj D, Netrapalli P (2019) Making the last iterate of sgd information theoretically optimal. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, vol. 99 (PMLR, New York), 1752–1755.Google Scholar
  • [21] Jain P, Kakade S, Kidambi R, Netrapalli P, Sidford A (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(2018):1–42.Google Scholar
  • [22] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Accelerating stochastic gradient descent for least squares regression. Proc. 31st Conf. Learn. Theory (PMLR, New York), 545–604.Google Scholar
  • [23] Joulin A, Ollivier Y (2010) Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38(6):2418–2442.CrossrefGoogle Scholar
  • [24] Kushner H, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications, vol. 35 (Springer Science & Business Media, New York).Google Scholar
  • [25] Lakshminarayanan C, Szepesvari C (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1347–1355.Google Scholar
  • [26] Lauand CK, Meyn S (2022) Bias in stochastic approximation cannot be eliminated with averaging. 2022 58th Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE Press, Piscataway, NJ), 1–4.Google Scholar
  • [27] Ljung L (2002) Recursive identification algorithms. Circuits Systems Signal Processing 21(1):57–68.CrossrefGoogle Scholar
  • [28] Mou W, Pananjady A, Wainwright MJ, Bartlett PL (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
  • [29] Mou W, Li CJ, Wainwright MJ, Bartlett PL, Jordan MI (2020) On linear stochastic approximation: Fine-grained Polyak–Ruppert and non-asymptotic concentration. Proc. Thirty Third Conf. Learn. Theory, vol. 125 (PMLR, New York), 2947–2997.Google Scholar
  • [30] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [31] Osekowski A (2012) Sharp Martingale and Semimartingale Inequalities, 1st ed., Monografie Matematyczne 72 (Birkhäuser, Basel, Switzerland).CrossrefGoogle Scholar
  • [32] Paulin D (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic J. Probab. 20(2015):1–32.Google Scholar
  • [33] Pinelis I (1994) Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22(4):1679–1706.CrossrefGoogle Scholar
  • [34] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [35] Rakhlin A, Shamir O, Sridharan K (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1571–1578.Google Scholar
  • [36] Rio E (2017) Asymptotic Theory of Weakly Dependent Random Processes, vol. 80 (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [37] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
  • [38] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. Thirty-Second Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
  • [39] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.CrossrefGoogle Scholar
  • [40] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [41] Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • [42] Widrow B, Stearns SD (1985) Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.