Distributed Stochastic Optimization with Large Delays

Published Online:https://doi.org/10.1287/moor.2021.1200

References

  • [1] Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems 24 (Curran Associates), 873–881.Google Scholar
  • [2] Avron H, Druinsky A, Gupta A (2015) Revisiting asynchronous linear solvers: Provable convergence rate through randomization. J. ACM 62(6):51.CrossrefGoogle Scholar
  • [3] Benaïm M (1999) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.CrossrefGoogle Scholar
  • [4] Benaïm M, Hirsch MW (1996) Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynamic Differential Equations 8(1):141–176.CrossrefGoogle Scholar
  • [5] Benaïm M, Schreiber SJ (2000) Ergodic properties of weak asymptotic pseudotrajectories for semiflows. J. Dynamic Differential Equations 12(3):579–598.CrossrefGoogle Scholar
  • [6] Bertsekas DP (1995) Nonlinear Programming (Athena Scientific, Nashua, NH).Google Scholar
  • [7] Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
  • [8] Bertsekas DP, Tsitsiklis JN (1997) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua, NH).Google Scholar
  • [9] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK, and Hindustan Book Agency, New Delhi, India).CrossrefGoogle Scholar
  • [10] Carter M (2001) Foundations of Mathematical Economics (MIT Press, Cambridge, MA).Google Scholar
  • [11] Chaturapruek S, Duchi JC, Ré C (2015) Asynchronous stochastic convex optimization: The noise is in the noise and SGD don’t care. Adv. Neural Inform. Processing Systems 1531–1539.Google Scholar
  • [12] de Souza PN, Silva JN (2012) Berkeley Problems in Mathematics. Problem Books in Mathematics (Springer, New York).Google Scholar
  • [13] Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao MZ, et al.. (2012) Large scale distributed deep networks. Proc. 25th Internat. Conf. Neural Information Processing Systems, vol. 1 (Curran Associates), 1223–1231.Google Scholar
  • [14] Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, et al. (2012) Large scale distributed deep networks. Adv. Neural Inform. Processing Systems 1223–1231.Google Scholar
  • [15] Dey SS, Molinaro M, Wang Q (2017) Analysis of sparse cutting planes for sparse milps with applications to stochastic milps. Math. Oper. Res. 43(1):304–332.LinkGoogle Scholar
  • [16] Feng J, Xu H, Yan S (2013) Online robust PCA via stochastic optimization. Adv. Neural Inform. Processing Systems 26(1):404–412.Google Scholar
  • [17] Fercoq O, Richtárik P (2015) Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4):1997–2023.CrossrefGoogle Scholar
  • [18] Feyzmahdavian HR, Aytekin A, Johansson M (2016) An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans. Automated Control 61(12):3740–3754.CrossrefGoogle Scholar
  • [19] Ge R, Huang F, Jin C, Yuan Y (2015) Escaping from saddle points? Online stochastic gradient for tensor decomposition. Proc. Conf. Learn. Theory, 40:797–842.Google Scholar
  • [20] Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.CrossrefGoogle Scholar
  • [21] Hall P, Heyde CC (1980) Martingale Limit Theory and Its Application. Probability and Mathematical Statistics (Academic Press, New York).Google Scholar
  • [22] Héliou A, Mertikopoulos P, Zhou Z (2020) Gradient-free online learning in continuous games with delayed rewards. Proc. 37th Internat. Conf. Machine Learn. 119:4172–4181.Google Scholar
  • [23] Hong M (2017) A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An ADMM approach. IEEE Trans. Control Network Systems 5(3):935–945.Google Scholar
  • [24] Hsieh Y-G, Iutzeler F, Malick J, Mertikopoulos P (2020) Multi-agent online optimization with delays: Asynchronicity, adaptivity, and optimism. Preprint, submitted December 21, 2020, https://arxiv.org/abs/2012.11579.Google Scholar
  • [25] Jin C, Ge R, Netrapalli P, Kakade SM, Jordan MI (2017) How to escape saddle points efficiently. Proc. 34th Internat. Conf. Machine Learn. vol. 70, 1724–1732.Google Scholar
  • [26] Joulani P, György A, Szepesvári C (2016) Delay-tolerant online convex optimization: Unified analysis and adaptive-gradient algorithms. Proc. 30th Conf. Artificial Intelligence 1744–1750.Google Scholar
  • [27] Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25 (Curran Associates), 1097–1105.Google Scholar
  • [28] Kushner H, Yin GG (2013) Stochastic Approximation and Recursive Algorithms and Applications. Stochastic Modelling and Applied Probability (Springer, New York).Google Scholar
  • [29] Lee J, Hasenbein J, Morton D (2011) Stochastic optimization models for rapid detection of viruses in cellphone networks. Oper. Res. Lett. 43(1):59–64.Google Scholar
  • [30] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent converges to minimizers. 29th Annual Conference on Learning Theory (PMLR) 49:1246–1257.Google Scholar
  • [31] Lian X, Huang Y, Li Y, Liu J (2015) Asynchronous parallel stochastic gradient for nonconvex optimization. Adv. Neural Inform. Processing Systems 28:2737–2745.Google Scholar
  • [32] Lian X, Zhang H, Hsieh C-J, Huang Y, Liu J (2016) A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order. Adv. Neural Inform. Processing Systems 29:3054–3062.Google Scholar
  • [33] Liu J, Wright SJ (2015) Asynchronous stochastic coordinate descent: Parallelism and convergence properties. SIAM J. Optim. 25(1):351–376.CrossrefGoogle Scholar
  • [34] Liu J, Wright SJ, Sridhar S (2014) An asynchronous parallel randomized kaczmarz algorithm. Preprint, submitted January 20, 2014, https://arxiv.org/abs/1401.4780.Google Scholar
  • [35] Mania H, Pan X, Papailiopoulos D, Recht B, Ramchandran K, Jordan MI (2017) Perturbed iterate analysis for asynchronous stochastic optimization. SIAM J. Optim. 27(4):2202–2229.CrossrefGoogle Scholar
  • [36] Marecek J, Richtarik P, Takac M (2015) Distributed block coordinate descent for minimizing partially separable functions. Numerical Analysis and Optimization (Springer, Berlin), 261–288.CrossrefGoogle Scholar
  • [37] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Norwell, MA).CrossrefGoogle Scholar
  • [38] Paine T, Jin H, Yang J, Lin Z, Huang T (2013) GPU asynchronous stochastic gradient descent to speed up neural network training. Preprint, submitted December 21, 2013, https://arxiv.org/abs/1312.6186.Google Scholar
  • [39] Quanrud K, Khashabi D (2015) Online learning with adversarial delays. Proc. 29th Internat. Conf. Neural Information Processing Systems 1270–1278.Google Scholar
  • [40] Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inform. Processing Systems 24:693–701.Google Scholar
  • [41] Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput. J. 3(3):175–184.CrossrefGoogle Scholar
  • [42] Ruszczyński A, Shapiro A (2003) Stochastic programming models. Handbook Oper. Res. Management Sci. 10:1–64.Google Scholar
  • [43] Shapiro A, Philpott A (2007) A tutorial on stochastic programming.Google Scholar
  • [44] Tappenden R, Takac M, Richtarik P (2017) On the complexity of parallel coordinate descent. Optim. Methods Software 33(2):1–24.Google Scholar
  • [45] Tran K, Hosseini S, Xiao L, Finley T, Bilenko M (2015) Scaling up stochastic dual coordinate ascent. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1185–1194.Google Scholar
  • [46] Tsitsiklis JN (1984) Problems in decentralized decision making and computation. PhD thesis, MIT, Cambridge, MA.Google Scholar
  • [47] Tsitsiklis JN, Bertsekas DP, Athans M (1986) Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control 31(9):803–812.Google Scholar
  • [48] Uryasev S, Pardalos PM (2010) Stochastic Optimization: Algorithms and Applications (Springer, New York).Google Scholar
  • [49] Wang Y-X, Sadhanala V, Dai W, Neiswanger W, Sra S, Xing E (2016) Parallel and distributed block-coordinate frank-wolfe algorithms. Proc. Internat. Conf. Machine Learn. 48:1548–1557.Google Scholar
  • [50] Wright SJ (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.CrossrefGoogle Scholar
  • [51] Zhang R, Kwok J (2014) Asynchronous distributed admm for consensus optimization. Proc. Internat. Conf. Machine Learn. 32(2):1701–1709.Google Scholar
  • [52] Zhang S, Choromanska AE, LeCun Y (2015) Deep learning with elastic averaging sgd. Adv. Neural Inform. Processing Systems 28:685–693.Google Scholar
  • [53] Zhang S, Zhang C, You Z, Zheng R, Xu B (2013) Asynchronous stochastic gradient descent for DNN training. Proc. IEEE Internat. Conf. Acoustics Speech Signal Processing 6660–6663.Google Scholar
  • [54] Zhou Z, Mertikopoulos P, Bambos N, Boyd S, Glynn PW (2017) Stochastic mirror descent in variationally coherent optimization problems. Adv. Neural Inform. Processing Systems 7040–7049.Google Scholar
  • [55] Zhou Z, Mertikopoulos P, Bambos N, Boyd SP, Glynn PW (2020) On the convergence of mirror descent beyond stochastic convex programming. SIAM J. Optim. 30(1):687–716.CrossrefGoogle Scholar
  • [56] Zhou Z, Mertikopoulos P, Bambos N, Glynn P, Ye Y, Li L-J, Li F-F (2018) Distributed asynchronous optimization with unbounded delays: How slow can you go? Proc. Internat. Conf. Machine Learn., 5970–5979.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.