A Single-Loop Algorithm for Decentralized Bilevel Optimization
References
- [1] (2017) First-Order Methods in Optimization, MOS-SIAM Series on Optimization, vol. 25 (SIAM, Philadelphia).Crossref, Google Scholar
- [2] (2020) Optimal learning from verified training data. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 9520–9529.Google Scholar
- [3] (2025) Decentralized bilevel optimization. Optim. Lett. 19(7):1249–1313.Crossref, Google Scholar
- [4] (2025) Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles. J. Machine Learn. Res. 26(109):1–56.Google Scholar
- [5] (2021) Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 25294–25307.Google Scholar
- [6] (2023) Decentralized stochastic bilevel optimization with improved per-iteration complexity. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (JMLR.org), 4641–4671.Google Scholar
- [7] (2025) On the convergence analysis of the decentralized projected gradient descent. SIAM J. Optim. 35(3):1673–1702.Crossref, Google Scholar
- [8] (2022) A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 26698–26710.Google Scholar
- [9] (2021) mpi4py: Status update after 12 years of development. Comput. Sci. Engrg. 23(4):47–54.Crossref, Google Scholar
- [10] (2018) Bilevel programming for hyperparameter optimization and meta-learning. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 1568–1577.Google Scholar
- [11] (2023) On the convergence of distributed stochastic bilevel optimization algorithms over a network. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 9238–9281.Google Scholar
- [12] (2018) Approximation methods for bilevel programming. Preprint, submitted February 6, https://arxiv.org/abs/1802.02246.Google Scholar
- [13] (2020) On the iteration complexity of hypergradient computation. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (JMLR.org), 3748–3758.Google Scholar
- [14] (2023) A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM J. Optim. 33(1):147–180.Crossref, Google Scholar
- [15] (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.Crossref, Google Scholar
- [16] (2021) Bilevel optimization: Convergence analysis and enhanced design. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 4882–4892.Google Scholar
- [17] (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 11490–11500.Google Scholar
- [18] (2022) Will bilevel optimizers benefit from loops. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 3011–3023.Google Scholar
- [19] (2023) A fully first-order method for stochastic bilevel optimization. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 18083–18113.Google Scholar
- [20] (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
- [21] (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 5336–5346.Google Scholar
- [22] (2023) Averaged method of multipliers for bi-level optimization without lower-level strong convexity. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 21839–21866.Google Scholar
- [23] (2016) NEXT: In-network nonconvex optimization. IEEE Trans. Signal Inform. Processing Networks 2(2):120–136.Google Scholar
- [24] (2022) A stochastic linearized augmented Lagrangian method for decentralized bilevel optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 30638–30650.Google Scholar
- [25] (2017) Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4):2597–2633.Crossref, Google Scholar
- [26] (2018) Lectures on Convex Optimization, Springer Optimization and Its Applications, vol. 137 (Springer, Cham, Switzerland).Crossref, Google Scholar
- [27] (2016) Hyperparameter optimization with approximate gradient. Proc. 33rd Internat. Conf Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 737–746.Google Scholar
- [28] (2021) Distributed stochastic gradient tracking methods. Math. Programming 187(1–2):409–457.Crossref, Google Scholar
- [29] (2018) Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Network Systems 5(3):1245–1260.Crossref, Google Scholar
- [30] (2019) Meta-learning with implicit gradients. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 113–124.Google Scholar
- [31] (2014) Adaptation, learning, and optimization over networks. Foundations Trends Machine Learn. 7(4–5):311–801.Crossref, Google Scholar
- [32] (2019) Truncated back-propagation for bilevel optimization. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 1723–1732.Google Scholar
- [33] (2021) Fast algorithms for Stackelberg prediction game with least squares loss. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 10708–10716.Google Scholar
- [34] (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
- [35] (2023) Alternating projected SGD for equality-constrained bilevel optimization. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 987–1023.Google Scholar
- [36] (2023) SimFBO: Towards simple, flexible and communication-efficient federated bilevel learning. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 33027–33040.Google Scholar
- [37] (2023) On the communication complexity of decentralized bilevel optimization. Preprint, submitted November 19, https://arxiv.org/abs/2311.11342.Google Scholar
- [38] (2022) Revisiting and advancing fast adversarial training through the lens of bi-level optimization. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 26693–26712.Google Scholar

