Distributed Stochastic Optimization with Large Delays
Published Online:1 Dec 2021https://doi.org/10.1287/moor.2021.1200
References
- [1] (2011) Distributed delayed stochastic optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems 24 (Curran Associates), 873–881.Google Scholar
- [2] (2015) Revisiting asynchronous linear solvers: Provable convergence rate through randomization. J. ACM 62(6):51.Crossref, Google Scholar
- [3] (1999) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.Crossref, Google Scholar
- [4] (1996) Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynamic Differential Equations 8(1):141–176.Crossref, Google Scholar
- [5] (2000) Ergodic properties of weak asymptotic pseudotrajectories for semiflows. J. Dynamic Differential Equations 12(3):579–598.Crossref, Google Scholar
- [6] (1995) Nonlinear Programming (Athena Scientific, Nashua, NH).Google Scholar
- [7] (1996) Neuro-Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
- [8] (1997) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua, NH).Google Scholar
- [9] (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK, and Hindustan Book Agency, New Delhi, India).Crossref, Google Scholar
- [10] (2001) Foundations of Mathematical Economics (MIT Press, Cambridge, MA).Google Scholar
- [11] (2015) Asynchronous stochastic convex optimization: The noise is in the noise and SGD don’t care. Adv. Neural Inform. Processing Systems 1531–1539.Google Scholar
- [12] (2012) Berkeley Problems in Mathematics. Problem Books in Mathematics (Springer, New York).Google Scholar
- [13] . (2012) Large scale distributed deep networks. Proc. 25th Internat. Conf. Neural Information Processing Systems, vol. 1 (Curran Associates), 1223–1231.Google Scholar
- [14] (2012) Large scale distributed deep networks. Adv. Neural Inform. Processing Systems 1223–1231.Google Scholar
- [15] (2017) Analysis of sparse cutting planes for sparse milps with applications to stochastic milps. Math. Oper. Res. 43(1):304–332.Link, Google Scholar
- [16] (2013) Online robust PCA via stochastic optimization. Adv. Neural Inform. Processing Systems 26(1):404–412.Google Scholar
- [17] (2015) Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4):1997–2023.Crossref, Google Scholar
- [18] (2016) An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans. Automated Control 61(12):3740–3754.Crossref, Google Scholar
- [19] (2015) Escaping from saddle points? Online stochastic gradient for tensor decomposition. Proc. Conf. Learn. Theory, 40:797–842.Google Scholar
- [20] (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
- [21] (1980) Martingale Limit Theory and Its Application. Probability and Mathematical Statistics (Academic Press, New York).Google Scholar
- [22] (2020) Gradient-free online learning in continuous games with delayed rewards. Proc. 37th Internat. Conf. Machine Learn. 119:4172–4181.Google Scholar
- [23] (2017) A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An ADMM approach. IEEE Trans. Control Network Systems 5(3):935–945.Google Scholar
- [24] (2020) Multi-agent online optimization with delays: Asynchronicity, adaptivity, and optimism. Preprint, submitted December 21, 2020, https://arxiv.org/abs/2012.11579.Google Scholar
- [25] (2017) How to escape saddle points efficiently. Proc. 34th Internat. Conf. Machine Learn. vol. 70, 1724–1732.Google Scholar
- [26] (2016) Delay-tolerant online convex optimization: Unified analysis and adaptive-gradient algorithms. Proc. 30th Conf. Artificial Intelligence 1744–1750.Google Scholar
- [27] (2012) Imagenet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25 (Curran Associates), 1097–1105.Google Scholar
- [28] (2013) Stochastic Approximation and Recursive Algorithms and Applications. Stochastic Modelling and Applied Probability (Springer, New York).Google Scholar
- [29] (2011) Stochastic optimization models for rapid detection of viruses in cellphone networks. Oper. Res. Lett. 43(1):59–64.Google Scholar
- [30] (2016) Gradient descent converges to minimizers. 29th Annual Conference on Learning Theory (PMLR) 49:1246–1257.Google Scholar
- [31] (2015) Asynchronous parallel stochastic gradient for nonconvex optimization. Adv. Neural Inform. Processing Systems 28:2737–2745.Google Scholar
- [32] (2016) A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order. Adv. Neural Inform. Processing Systems 29:3054–3062.Google Scholar
- [33] (2015) Asynchronous stochastic coordinate descent: Parallelism and convergence properties. SIAM J. Optim. 25(1):351–376.Crossref, Google Scholar
- [34] (2014) An asynchronous parallel randomized kaczmarz algorithm. Preprint, submitted January 20, 2014, https://arxiv.org/abs/1401.4780.Google Scholar
- [35] (2017) Perturbed iterate analysis for asynchronous stochastic optimization. SIAM J. Optim. 27(4):2202–2229.Crossref, Google Scholar
- [36] (2015) Distributed block coordinate descent for minimizing partially separable functions. Numerical Analysis and Optimization (Springer, Berlin), 261–288.Crossref, Google Scholar
- [37] (2004) Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Norwell, MA).Crossref, Google Scholar
- [38] (2013) GPU asynchronous stochastic gradient descent to speed up neural network training. Preprint, submitted December 21, 2013, https://arxiv.org/abs/1312.6186.Google Scholar
- [39] (2015) Online learning with adversarial delays. Proc. 29th Internat. Conf. Neural Information Processing Systems 1270–1278.Google Scholar
- [40] (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inform. Processing Systems 24:693–701.Google Scholar
- [41] (1960) An automatic method for finding the greatest or least value of a function. Comput. J. 3(3):175–184.Crossref, Google Scholar
- [42] (2003) Stochastic programming models. Handbook Oper. Res. Management Sci. 10:1–64.Google Scholar
- [43] (2007) A tutorial on stochastic programming.Google Scholar
- [44] (2017) On the complexity of parallel coordinate descent. Optim. Methods Software 33(2):1–24.Google Scholar
- [45] (2015) Scaling up stochastic dual coordinate ascent. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1185–1194.Google Scholar
- [46] (1984) Problems in decentralized decision making and computation. PhD thesis, MIT, Cambridge, MA.Google Scholar
- [47] (1986) Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control 31(9):803–812.Google Scholar
- [48] (2010) Stochastic Optimization: Algorithms and Applications (Springer, New York).Google Scholar
- [49] (2016) Parallel and distributed block-coordinate frank-wolfe algorithms. Proc. Internat. Conf. Machine Learn. 48:1548–1557.Google Scholar
- [50] (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.Crossref, Google Scholar
- [51] (2014) Asynchronous distributed admm for consensus optimization. Proc. Internat. Conf. Machine Learn. 32(2):1701–1709.Google Scholar
- [52] (2015) Deep learning with elastic averaging sgd. Adv. Neural Inform. Processing Systems 28:685–693.Google Scholar
- [53] (2013) Asynchronous stochastic gradient descent for DNN training. Proc. IEEE Internat. Conf. Acoustics Speech Signal Processing 6660–6663.Google Scholar
- [54] (2017) Stochastic mirror descent in variationally coherent optimization problems. Adv. Neural Inform. Processing Systems 7040–7049.Google Scholar
- [55] (2020) On the convergence of mirror descent beyond stochastic convex programming. SIAM J. Optim. 30(1):687–716.Crossref, Google Scholar
- [56] (2018) Distributed asynchronous optimization with unbounded delays: How slow can you go? Proc. Internat. Conf. Machine Learn., 5970–5979.Google Scholar

