[1] Beck A (2017) First-Order Methods in Optimization, MOS-SIAM Series on Optimization, vol. 25 (SIAM, Philadelphia).Crossref, Google Scholar
[2] Bishop N, Tran-Thanh L, Gerding E (2020) Optimal learning from verified training data. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 9520–9529.Google Scholar
[3] Chen X, Huang M, Ma S (2025) Decentralized bilevel optimization. Optim. Lett. 19(7):1249–1313.Crossref, Google Scholar
[4] Chen L, Ma Y, Zhang J (2025) Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles. J. Machine Learn. Res. 26(109):1–56.Google Scholar
[5] Chen T, Sun Y, Yin W (2021) Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 25294–25307.Google Scholar
[6] Chen X, Huang M, Ma S, Balasubramanian K (2023) Decentralized stochastic bilevel optimization with improved per-iteration complexity. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (JMLR.org), 4641–4671.Google Scholar
[7] Choi W, Kim J (2025) On the convergence analysis of the decentralized projected gradient descent. SIAM J. Optim. 35(3):1673–1702.Crossref, Google Scholar
[8] Dagréou M, Ablin P, Vaiter S, Moreau T (2022) A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 26698–26710.Google Scholar
[9] Dalcin L, Fang YLL (2021) mpi4py: Status update after 12 years of development. Comput. Sci. Engrg. 23(4):47–54.Crossref, Google Scholar
[10] Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel programming for hyperparameter optimization and meta-learning. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 1568–1577.Google Scholar
[11] Gao H, Gu B, Thai MT (2023) On the convergence of distributed stochastic bilevel optimization algorithms over a network. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 9238–9281.Google Scholar
[12] Ghadimi S, Wang M (2018) Approximation methods for bilevel programming. Preprint, submitted February 6, https://arxiv.org/abs/1802.02246.Google Scholar
[13] Grazzi R, Franceschi L, Pontil M, Salzo S (2020) On the iteration complexity of hypergradient computation. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (JMLR.org), 3748–3758.Google Scholar
[14] Hong M, Wai HT, Wang Z, Yang Z (2023) A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM J. Optim. 33(1):147–180.Crossref, Google Scholar
[15] Hu X, Xiao N, Liu X, Toh KC (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.Crossref, Google Scholar
[16] Ji K, Yang J, Liang Y (2021) Bilevel optimization: Convergence analysis and enhanced design. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 4882–4892.Google Scholar
[17] Ji K, Lee JD, Liang Y, Poor HV (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 11490–11500.Google Scholar
[18] Ji K, Liu M, Liang Y, Ying L (2022) Will bilevel optimizers benefit from loops. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 3011–3023.Google Scholar
[19] Kwon J, Kwon D, Wright S, Nowak RD (2023) A fully first-order method for stochastic bilevel optimization. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 18083–18113.Google Scholar
[20] LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
[21] Lian X, Zhang C, Zhang H, Hsieh CJ, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 5336–5346.Google Scholar
[22] Liu R, Liu Y, Yao W, Zeng S, Zhang J (2023) Averaged method of multipliers for bi-level optimization without lower-level strong convexity. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 21839–21866.Google Scholar
[23] Lorenzo PD, Scutari G (2016) NEXT: In-network nonconvex optimization. IEEE Trans. Signal Inform. Processing Networks 2(2):120–136.Google Scholar
[24] Lu S, Zeng S, Cui X, Squillante M, Horesh L, Kingsbury B, Liu J, Hong M (2022) A stochastic linearized augmented Lagrangian method for decentralized bilevel optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 30638–30650.Google Scholar
[25] Nedić A, Olshevsky A, Shi W (2017) Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4):2597–2633.Crossref, Google Scholar
[26] Nesterov Y (2018) Lectures on Convex Optimization, Springer Optimization and Its Applications, vol. 137 (Springer, Cham, Switzerland).Crossref, Google Scholar
[27] Pedregosa F (2016) Hyperparameter optimization with approximate gradient. Proc. 33rd Internat. Conf Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 737–746.Google Scholar
[28] Pu S, Nedić A (2021) Distributed stochastic gradient tracking methods. Math. Programming 187(1–2):409–457.Crossref, Google Scholar
[29] Qu G, Li N (2018) Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Network Systems 5(3):1245–1260.Crossref, Google Scholar
[30] Rajeswaran A, Finn C, Kakade SM, Levine S (2019) Meta-learning with implicit gradients. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 113–124.Google Scholar
[31] Sayed AH (2014) Adaptation, learning, and optimization over networks. Foundations Trends Machine Learn. 7(4–5):311–801.Crossref, Google Scholar
[32] Shaban A, Cheng CA, Hatch N, Boots B (2019) Truncated back-propagation for bilevel optimization. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 1723–1732.Google Scholar
[33] Wang J, Chen H, Jiang R, Li X, Li Z (2021) Fast algorithms for Stackelberg prediction game with least squares loss. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 10708–10716.Google Scholar
[34] Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
[35] Xiao Q, Shen H, Yin W, Chen T (2023) Alternating projected SGD for equality-constrained bilevel optimization. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 987–1023.Google Scholar
[36] Yang Y, Xiao P, Ji K (2023) SimFBO: Towards simple, flexible and communication-efficient federated bilevel learning. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 33027–33040.Google Scholar
[37] Zhang Y, Thai MT, Wu J, Gao H (2023) On the communication complexity of decentralized bilevel optimization. Preprint, submitted November 19, https://arxiv.org/abs/2311.11342.Google Scholar
[38] Zhang Y, Zhang G, Khanduri P, Hong M, Chang S, Liu S (2022) Revisiting and advancing fast adversarial training through the lens of bi-level optimization. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 26693–26712.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:April 22, 2024
Accepted:August 31, 2025
Published Online:October 03, 2025

Cite as

Youran Dong, Shiqian Ma, Junfeng Yang, Chao Yin (2025) A Single-Loop Algorithm for Decentralized Bilevel Optimization. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0488

Keywords

Acknowledgments

The authors are grateful to the editors and reviewers for their constructive comments that greatly improved the presentation of this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Single-Loop Algorithm for Decentralized Bilevel Optimization

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News