Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling
Abstract
We consider the distributed optimization problem with data dispersed across multiple workers under the orchestration of a parameter server. In distributed environments, variations in computation speeds and network conditions across workers often lead to significant idle times in synchronous training. Although asynchronous training has been widely explored to reduce the synchronization overhead, existing methods either assume bounded dissimilarity among workers’ local data, which hampers performance under high data heterogeneity, or rely on worker scheduling strategies that limit system asynchrony. This work proposes the dual-delayed stochastic gradient descent (DuDe-SGD) algorithm to overcome the above limitations. Through a server-side buffer architecture, DuDe-SGD makes use of stale stochastic gradients from all workers to neutralize the effects of data heterogeneity while maintaining full asynchrony and per-iteration computation cost on par with traditional asynchronous stochastic gradient descent (SGD) algorithms. Our analysis demonstrates that DuDe-SGD achieves a comparable convergence rate for smooth nonconvex problems as state-of-the-art asynchronous SGD algorithms, even with arbitrarily heterogeneous data without adopting any worker scheduling schemes. Numerical experiments demonstrate the favorable performance of DuDe-SGD compared with existing synchronous and asynchronous SGD-based algorithms, especially in scenarios with highly heterogeneous data.
History: Accepted by Antonio Frangioni, Area Editor for Design & Analysis of Algorithms–Continuous.
Funding: The research of X. Wang was supported in part by the Young Scientists Fund of the National Natural Science Foundation of China [Grant 12501426] and the Key Program of the National Natural Science Foundation of China [Grant 62432007]. The research of J. Zhang was supported in part by the Hong Kong Research Grants Council under the Areas of Excellence Scheme [Grant AoE/E-601/22-R] and in part by the National Natural Science Foundation of China/Hong Kong Research Grants Council Collaborative Research Scheme [Grant CRS_HKUST603/22].
Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2025.1443) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2025.1443). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

