Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling
Published Online:16 Jun 2026https://doi.org/10.1287/ijoc.2025.1443
References
- (2011) Distributed delayed stochastic optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 873–881.Google Scholar
- (2020) A tight convergence analysis for stochastic gradient descent with delayed updates. Kontorovich A, Neu G, eds. Proc. 31st Internat. Conf. Algorithmic Learn. Theory, Proceedings of Machine Learning Research, vol. 117 (PMLR, New York), 111–132.Google Scholar
- (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199(1):165–214.Crossref, Google Scholar
- (2020) Advances in asynchronous parallel and distributed optimization. Proc. IEEE 108(11):2013–2031.Crossref, Google Scholar
- (2021) Federated learning under arbitrary communication patterns. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 425–435.Google Scholar
- (2007) A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1):29–51.Crossref, Google Scholar
- (2020) Asynchronous online federated learning for edge devices with non-IID data. Proc. 2020 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 15–24.Google Scholar
- (2011) Better mini-batch algorithms via accelerated gradient methods. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 1647–1655.Google Scholar
- (2015) Taming the wild: A unified analysis of Hogwild-style algorithms. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2674–2682.Google Scholar
- (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 1646–1654.Google Scholar
- (2012) Optimal distributed online prediction using mini-batches. J. Machine Learn. Res. 13(6):165–202.Google Scholar
- (2021) Slow and stale gradients can win the race. IEEE J. Selected Areas Inform. Theory 2(3):1012–1024.Crossref, Google Scholar
- (2016) An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans. Automatic Control 61(12):3740–3754.Crossref, Google Scholar
- (2023) A general theory for federated optimization with asynchronous and heterogeneous clients updates. J. Machine Learn. Res. 24(110):1–43.Google Scholar
- (2021) Provable distributed stochastic gradient descent with delayed updates. Domeniconi C, Davidson I, eds. Proc. 2021 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 441–449.Google Scholar
- (2022) Asynchronous distributed optimization with stochastic delays. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 9247–9279.Google Scholar
- (2021) Fast federated learning in the presence of arbitrary device unavailability. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Red Hook, NY), 12052–12064.Google Scholar
- (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2):1035–1048.Crossref, Google Scholar
- (2024) AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 238 (PMLR, New York), 649–657.Google Scholar
- (2020) Scaffold: Stochastic controlled averaging for federated learning. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 5132–5143.Google Scholar
- (2023) Better theory for SGD in the nonconvex world. Trans. Machine Learn. Res. https://openreview.net/forum?id=AU4qHN2VkS.Google Scholar
- (2022) Sharper convergence guarantees for asynchronous SGD for distributed and federated learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 17202–17215.Google Scholar
- (2020) A unified theory of decentralized SGD with changing topology and local updates. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 5381–5393.Google Scholar
- (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, Toronto.Google Scholar
- (2018) Improved asynchronous parallel optimization analysis for stochastic incremental methods. J. Machine Learn. Res. 19(81):1–68.Google Scholar
- (2024) FAVANO: Federated averaging with asynchronous nodes. Proc. 2024 IEEE Internat. Conf. Acoustics Speech Signal Processing (IEEE, Piscataway, NJ), 5665–5669.Google Scholar
- (2024) Queuing dynamics of asynchronous federated learning. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist., vol. 238 (PMLR, New York), 1711–1719.Google Scholar
- (2022) Federated learning on non-IID data silos: An experimental study. Proc. 2022 IEEE 38th Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ), 965–978.Google Scholar
- (2015) Asynchronous parallel stochastic gradient for nonconvex optimization. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2737–2745.Google Scholar
- (2018) Asynchronous decentralized parallel stochastic gradient descent. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 3043–3052.Google Scholar
- (2017) Perturbed iterate analysis for asynchronous stochastic optimization. SIAM J. Optim. 27(4):2202–2229.Crossref, Google Scholar
- (2022) Asynchronous SGD beats minibatch SGD under arbitrary delays. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 420–433.Google Scholar
- (2001) Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 8(C):381–407.Google Scholar
- (2022) Federated learning with buffered asynchronous aggregation. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 151 (PMLR, New York), 3581–3607.Google Scholar
- (2011) Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 693–701.Google Scholar
- (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Red Hook, NY), 2672–2680.Google Scholar
- (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 162(1–2):83–112.Crossref, Google Scholar
- (2020) The error-feedback framework: SGD with delayed gradients. J. Machine Learn. Res. 21(237):1–36.Google Scholar
- (2024) MimiC: Combating client dropouts in federated learning by mimicking central updates. IEEE Trans. Mobile Comput. 23(7):7572–7584.Crossref, Google Scholar
- (2022) Unbounded gradients in federated learning with buffered asynchronous aggregation. Proc. 2022 58th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1–8.Google Scholar
- (2023) Optimal time complexities of parallel stochastic optimization methods under a fixed computation model. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Red Hook, NY), 16515–16577.Google Scholar
- (2018) Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2):1282–1300.Crossref, Google Scholar
- (2020) A survey on distributed machine learning. ACM Comput. Surveys 53(2):30.Google Scholar
- (2023a) Linear speedup of incremental aggregated gradient methods on streaming data. Proc. 2023 62nd IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 4314–4319.Google Scholar
- (2025) Achieving linear speedup in asynchronous federated learning with heterogeneous clients. IEEE Trans. Mobile Comput. 24(1):435–448.Crossref, Google Scholar
- (2026) Tackling arbitrarily heterogeneous data in asynchronous stochastic gradient descent without worker scheduling. https://doi.org/10.1287/ijoc.2025.1443.cd, https://github.com/INFORMSJoC/2025.1443.Google Scholar
- (2023b) Tackling the data heterogeneity in asynchronous federated learning with cached update calibration. Proc. 12th Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
- (2020) Tackling the objective inconsistency problem in heterogeneous federated optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Red Hook, NY), 7611–7623.Google Scholar
- (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
- (2020) Asynchronous federated optimization. 12th Annual Workshop Optim. Machine Learn. (OPT 2020).Google Scholar
- (2023) Asynchronous federated learning on heterogeneous devices: A survey. Comput. Sci. Rev. 50(C):100595.Crossref, Google Scholar
- (2024) Federated optimization under intermittent client availability. INFORMS J. Comput. 36(1):185–202.Link, Google Scholar
- (2019) Federated machine learning: Concept and applications. ACM Trans. Intelligent Systems Tech. 10(2):12.Crossref, Google Scholar
- (2019) Bayesian nonparametric federated learning of neural networks. Chaudhuri K, Salakhutdinov R, eds. Proc. 36th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 97 (PMLR, New York), 7252–7261.Google Scholar
- (2023) No one idles: Efficient heterogeneous federated learning with parallel edge and server computation. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 41399–41413.Google Scholar

