Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales

Published Online:https://doi.org/10.1287/ijoo.2019.0016

References

  • Alsmeyer G (2010) Renewal, Recurrance, and Regeneration. Accessed May 13, 2019, https://www.uni-muenster.de/Stochastik/alsmeyer/Skripten/rt_book.pdf.Google Scholar
  • Bandeira AS, Scheinberg K, Vicente LN (2014) Convergence of trust-region methods based on probabilistic models. SIAM J. Optim. 24(3):1238–1264.Google Scholar
  • Billups S, Larsson J, Graf P (2013) Derivative-free optimization of expensive functions with computational error using weighted regression. SIAM J. Optim. 55(2):27–53.Google Scholar
  • Byrd R, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math. Programming 134(1):127–155.Google Scholar
  • Cartis C, Scheinberg K (2018) Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Programming 169(2):337–375.Google Scholar
  • Chang K, Li M, Wan H (2013) Stochastic trust-region response-surface method (strong)—A new response-surface framework for simulation optimization. INFORMS J. Comput. 25(2):230–243.LinkGoogle Scholar
  • Chen R, Menickelly M, Scheinberg K (2018) Stochastic optimization using a trust-region method and random models. Math. Programming 169(2):447–487.Google Scholar
  • Conn A, Scheinberg K, Vicente L (2009) Introduction to Derivative-Free Optimization (SIAM, Philadelphia).Google Scholar
  • Conn AR, Gould NIM, Toint PT (2000) Trust Region Methods, MPS/SIAM Series on Optimization (SIAM, Philadelphia).Google Scholar
  • Dauphin Y, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 2933–2941.Google Scholar
  • Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing, vol. 27 (Curran Associates, Red Hook, NY), 1646–1654.Google Scholar
  • Friedlander M, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):1380–1405.Google Scholar
  • Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Google Scholar
  • Golub GH, Loan CFV (1989) Matrix Computations, 2nd ed. (Johns Hopkins University Press, Baltimore).Google Scholar
  • Gratton S, Royer CW, Vicente LN, Zhang Z (2017) Complexity and global rates of trust-region methods based on probabilistic models. IMA J. Numerical Anal. 38(3):1579–1597.Google Scholar
  • Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 315–323.Google Scholar
  • Larson J, Billups S (2015) Stochastic derivative-free optimization using a trust region framework. Comput. Optim. Appl. 64(3):619–645.Google Scholar
  • Lin C, Weng RC, Keerthi SS, Smola A (2007) Trust region Newton method for large-scale logistic regression. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 561–568.Google Scholar
  • Nesterov Y (2004) Introductory Lectures on Convex Optimization (Kluwer Academic Publsihers, Boston).Google Scholar
  • Nguyen L, Liu J, Scheinberg K, Takáč M (2017a) SARAH: A novel method for machine learning problems using stochastic recursive gradient. JMLR 70:2613–2621.Google Scholar
  • Nguyen L, Liu J, Scheinberg K, Takáč M (2017b) Stochastic recursive gradient algorithm for nonconvex optimization. Working paper, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • Nocedal J, Wright SJ (2006) Numerical Optimization, Springer Series in Operations Research, 2nd ed. (Springer, New York).Google Scholar
  • Paquette C, Scheinberg K (2018) A stochastic line search method with expected complexity analysis. Working paper, University of Waterloo, Waterloo, ON, Canada.Google Scholar
  • Reddi SJ, Hefny A, Sra S, Póczos B, Smola AJ (2016) Stochastic variance reduction for nonconvex optimization. PMLR 48:314–323.Google Scholar
  • Shashaani S, Hashemi FS, Pasupathy R (2015) Astro-df: A class of adaptive sampling trust-region algorithms for derivative-free simulation optimization. SIAM J. Optim. 28(4):3145–3176.Google Scholar
  • Tripuraneni N, Stern M, Jin C, Regier J, Jordan MI (2017) Stochastic cubic regularization for fast nonconvex optimization. Advances in Neural Information Processing, vol. 32 (Curran Associates, Red Hook, NY), 2899–2908.Google Scholar
  • Tropp JA (2015) An introduction to matrix concentration inequalities. Foundations Trends Machine Learn. 8(1–2):1–230.Google Scholar
  • Xu P, Roosta-Khorasani F, Mahoney MW (2017) Newton-type methods for non-convex optimization under inexact Hessian information. Working paper, Stanford University, Stanford, CA.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.