Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling

Xiaolu Wang
Corresponding Author
Xiaolu Wang
[email protected]
https://orcid.org/0000-0002-5267-3464
Ministry of Education (MOE) Engineering Research Center of Software/Hardware Co-Design and Application, East China Normal University, Shanghai 200062, China
Search for more papers by this author
,
Yuchang Sun
Yuchang Sun
[email protected]
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Search for more papers by this author
,
Hoi-To Wai
Hoi-To Wai
[email protected]
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China
Search for more papers by this author
,
Jun Zhang
Jun Zhang
[email protected]
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Search for more papers by this author

Corresponding Author

Xiaolu Wang

Ministry of Education (MOE) Engineering Research Center of Software/Hardware Co-Design and Application, East China Normal University, Shanghai 200062, China

Search for more papers by this author

Yuchang Sun

[email protected]

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China

Search for more papers by this author

Hoi-To Wai

[email protected]

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China

Search for more papers by this author

Jun Zhang

[email protected]

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China

Search for more papers by this author

Published Online:16 Jun 2026https://doi.org/10.1287/ijoc.2025.1443

References

Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 873–881.Google Scholar
Arjevani Y, Shamir O, Srebro N (2020) A tight convergence analysis for stochastic gradient descent with delayed updates. Kontorovich A, Neu G, eds. Proc. 31st Internat. Conf. Algorithmic Learn. Theory, Proceedings of Machine Learning Research, vol. 117 (PMLR, New York), 111–132.Google Scholar
Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199(1):165–214.Crossref, Google Scholar
Assran M, Aytekin A, Feyzmahdavian HR, Johansson M, Rabbat MG (2020) Advances in asynchronous parallel and distributed optimization. Proc. IEEE 108(11):2013–2031.Crossref, Google Scholar
Avdiukhin D, Kasiviswanathan S (2021) Federated learning under arbitrary communication patterns. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 425–435.Google Scholar
Blatt D, Hero AO, Gauchman H (2007) A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1):29–51.Crossref, Google Scholar
Chen Y, Ning Y, Slawski M, Rangwala H (2020) Asynchronous online federated learning for edge devices with non-IID data. Proc. 2020 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 15–24.Google Scholar
Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 1647–1655.Google Scholar
De Sa CM, Zhang C, Olukotun K, Ré C (2015) Taming the wild: A unified analysis of Hogwild-style algorithms. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2674–2682.Google Scholar
Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 1646–1654.Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J. Machine Learn. Res. 13(6):165–202.Google Scholar
Dutta S, Wang J, Joshi G (2021) Slow and stale gradients can win the race. IEEE J. Selected Areas Inform. Theory 2(3):1012–1024.Crossref, Google Scholar
Feyzmahdavian HR, Aytekin A, Johansson M (2016) An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans. Automatic Control 61(12):3740–3754.Crossref, Google Scholar
Fraboni Y, Vidal R, Kameni L, Lorenzi M (2023) A general theory for federated optimization with asynchronous and heterogeneous clients updates. J. Machine Learn. Res. 24(110):1–43.Google Scholar
Gao H, Wu G, Rossi R (2021) Provable distributed stochastic gradient descent with delayed updates. Domeniconi C, Davidson I, eds. Proc. 2021 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 441–449.Google Scholar
Glasgow MR, Wootters M (2022) Asynchronous distributed optimization with stochastic delays. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 9247–9279.Google Scholar
Gu X, Huang K, Zhang J, Huang L (2021) Fast federated learning in the presence of arbitrary device unavailability. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Red Hook, NY), 12052–12064.Google Scholar
Gurbuzbalaban M, Ozdaglar A, Parrilo PA (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2):1035–1048.Crossref, Google Scholar
Islamov R, Safaryan M, Alistarh D (2024) AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 238 (PMLR, New York), 649–657.Google Scholar
Karimireddy SP, Kale S, Mohri M, Reddi S, Stich S, Suresh AT (2020) Scaffold: Stochastic controlled averaging for federated learning. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 5132–5143.Google Scholar
Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans. Machine Learn. Res. https://openreview.net/forum?id=AU4qHN2VkS.Google Scholar
Koloskova A, Stich SU, Jaggi M (2022) Sharper convergence guarantees for asynchronous SGD for distributed and federated learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 17202–17215.Google Scholar
Koloskova A, Loizou N, Boreiri S, Jaggi M, Stich S (2020) A unified theory of decentralized SGD with changing topology and local updates. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 5381–5393.Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, Toronto.Google Scholar
Leblond R, Pedregosa F, Lacoste-Julien S (2018) Improved asynchronous parallel optimization analysis for stochastic incremental methods. J. Machine Learn. Res. 19(81):1–68.Google Scholar
Leconte L, Nguyen VM, Moulines E (2024) FAVANO: Federated averaging with asynchronous nodes. Proc. 2024 IEEE Internat. Conf. Acoustics Speech Signal Processing (IEEE, Piscataway, NJ), 5665–5669.Google Scholar
Leconte L, Jonckheere M, Samsonov S, Moulines E (2024) Queuing dynamics of asynchronous federated learning. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist., vol. 238 (PMLR, New York), 1711–1719.Google Scholar
Li Q, Diao Y, Chen Q, He B (2022) Federated learning on non-IID data silos: An experimental study. Proc. 2022 IEEE 38th Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ), 965–978.Google Scholar
Lian X, Huang Y, Li Y, Liu J (2015) Asynchronous parallel stochastic gradient for nonconvex optimization. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2737–2745.Google Scholar
Lian X, Zhang W, Zhang C, Liu J (2018) Asynchronous decentralized parallel stochastic gradient descent. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 3043–3052.Google Scholar
Mania H, Pan X, Papailiopoulos D, Recht B, Ramchandran K, Jordan MI (2017) Perturbed iterate analysis for asynchronous stochastic optimization. SIAM J. Optim. 27(4):2202–2229.Crossref, Google Scholar
Mishchenko K, Bach F, Even M, Woodworth BE (2022) Asynchronous SGD beats minibatch SGD under arbitrary delays. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 420–433.Google Scholar
Nedić A, Bertsekas DP, Borkar VS (2001) Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 8(C):381–407.Google Scholar
Nguyen J, Malik K, Zhan H, Yousefpour A, Rabbat M, Malek M, Huba D (2022) Federated learning with buffered asynchronous aggregation. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 151 (PMLR, New York), 3581–3607.Google Scholar
Recht B, Re C, Wright S, Niu F (2011) Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 693–701.Google Scholar
Roux N, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Red Hook, NY), 2672–2680.Google Scholar
Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 162(1–2):83–112.Crossref, Google Scholar
Stich SU, Karimireddy SP (2020) The error-feedback framework: SGD with delayed gradients. J. Machine Learn. Res. 21(237):1–36.Google Scholar
Sun Y, Mao Y, Zhang J (2024) MimiC: Combating client dropouts in federated learning by mimicking central updates. IEEE Trans. Mobile Comput. 23(7):7572–7584.Crossref, Google Scholar
Toghani MT, Uribe CA (2022) Unbounded gradients in federated learning with buffered asynchronous aggregation. Proc. 2022 58th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1–8.Google Scholar
Tyurin A, Richtárik P (2023) Optimal time complexities of parallel stochastic optimization methods under a fixed computation model. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Red Hook, NY), 16515–16577.Google Scholar
Vanli ND, Gürbüzbalaban M, Ozdaglar A (2018) Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2):1282–1300.Crossref, Google Scholar
Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS (2020) A survey on distributed machine learning. ACM Comput. Surveys 53(2):30.Google Scholar
Wang X, Jin C, Wai HT, Gu Y (2023a) Linear speedup of incremental aggregated gradient methods on streaming data. Proc. 2023 62nd IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 4314–4319.Google Scholar
Wang X, Li Z, Jin S, Zhang J (2025) Achieving linear speedup in asynchronous federated learning with heterogeneous clients. IEEE Trans. Mobile Comput. 24(1):435–448.Crossref, Google Scholar
Wang X, Sun Y, Wai H-T, Zhang J (2026) Tackling arbitrarily heterogeneous data in asynchronous stochastic gradient descent without worker scheduling. https://doi.org/10.1287/ijoc.2025.1443.cd, https://github.com/INFORMSJoC/2025.1443.Google Scholar
Wang Y, Cao Y, Wu J, Chen R, Chen J (2023b) Tackling the data heterogeneity in asynchronous federated learning with cached update calibration. Proc. 12th Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
Wang J, Liu Q, Liang H, Joshi G, Poor HV (2020) Tackling the objective inconsistency problem in heterogeneous federated optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Red Hook, NY), 7611–7623.Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
Xie C, Koyejo S, Gupta I (2020) Asynchronous federated optimization. 12th Annual Workshop Optim. Machine Learn. (OPT 2020).Google Scholar
Xu C, Qu Y, Xiang Y, Gao L (2023) Asynchronous federated learning on heterogeneous devices: A survey. Comput. Sci. Rev. 50(C):100595.Crossref, Google Scholar
Yan Y, Niu C, Ding Y, Zheng Z, Tang S, Li Q, Wu F, Lyu C, Feng Y, Chen G (2024) Federated optimization under intermittent client availability. INFORMS J. Comput. 36(1):185–202.Link, Google Scholar
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: Concept and applications. ACM Trans. Intelligent Systems Tech. 10(2):12.Crossref, Google Scholar
Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N, Khazaeni Y (2019) Bayesian nonparametric federated learning of neural networks. Chaudhuri K, Salakhutdinov R, eds. Proc. 36th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 97 (PMLR, New York), 7252–7261.Google Scholar
Zhang F, Liu X, Lin S, Wu G, Zhou X, Jiang J, Ji X (2023) No one idles: Efficient heterogeneous federated learning with parallel edge and server computation. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 41399–41413.Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:July 08, 2025
Accepted:May 01, 2026
Published Online:June 16, 2026

Cite as

Xiaolu Wang, Yuchang Sun, Hoi-To Wai, Jun Zhang (2026) Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2025.1443

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News