Technical Note—On the Convergence Rate of Stochastic Approximation for Gradient-Based Stochastic Optimization

Published Online:https://doi.org/10.1287/opre.2023.0055

References

  • Berahas AS, Cao L, Choromanski K, Scheinberg K (2022) A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Found. Comput. Math. 22(2):507–560.CrossrefGoogle Scholar
  • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.CrossrefGoogle Scholar
  • Chen J, Luss R (2019) Stochastic gradient descent with biased but consistent gradient estimators. Preprint, submitted July 31, 2018, https://arxiv.org/abs/1807.11880.Google Scholar
  • Demidovich Y, Malinovsky G, Sokolov I, Richtárik R (2023) A guide through the zoo of biased SGD. Preprint, submitted May 25, https://arxiv.org/abs/2305.16296.Google Scholar
  • Driggs D, Liang J, Schönlieb C (2022) On biased stochastic gradient estimation. J. Machine Learn. Res. 23(24):1057–1099.Google Scholar
  • Duchi JC, Jordan MI, Wainwright MJ, Wibisono A (2015) Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inform. Theory 61(5):2788–2806.CrossrefGoogle Scholar
  • Ghadimi S, Lan G (2012) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4):1469–1492.CrossrefGoogle Scholar
  • Hu B, Seiler P, Lessard L (2021) Analysis of biased stochastic gradient descent using sequential semidefinite programs. Math. Programming 187(1–2):383–408.CrossrefGoogle Scholar
  • Hu J, Song M, Fu M (2024) Quantile optimization via multiple timescale local search for black-box functions. Oper. Res. Forthcoming.Google Scholar
  • Karimi B, Miasojedow B, Moulines E, Wai HT (2019) Non-asymptotic analysis of biased stochastic approximation scheme. Proc. Machine Learn. Res. 99:1–31.Google Scholar
  • Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.CrossrefGoogle Scholar
  • Kushner HJ, Yin GG (1997) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
  • Rugh WJ (1996) Linear System Theory, 2nd ed. (Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • Scheinberg K (2022) Finite difference gradient approximation: To randomize or not? INFORMS J. Comput. 34(5):2384–2388.LinkGoogle Scholar
  • Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Control 37(3):332–341.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.