Stochastic Gradient Descent: Recent Trends
Stochastic gradient descent (SGD), also known as stochastic approximation, refers to certain simple iterative structures used for solving stochastic optimization and root-finding problems. The identifying feature of SGD is that, much like gradient descent for deterministic optimization, each successive iterate in the recursion is determined by adding an appropriately scaled gradient estimate to the prior iterate. Owing to several factors, SGD has become the leading method to solve optimization problems arising within large-scale machine learning and “big data” contexts such as classification and regression. In this tutorial, we cover the basics of SGD with an emphasis on modern developments. The tutorial starts with stochastic optimization examples and problem variations where SGD is applicable, and then it details important flavors of SGD that are currently in use. The oral presentation of this tutorial will include numerical examples.
Video of this TutORial from the 2018 INFORMS Annual Meeting in Phoenix, Arizona, November 6, 2018, is available at https://youtu.be/wKTH81w9hqE.