Bilevel Gradient Methods and the Morse Parametric Qualification Condition
References
- [1] (1997) Numerical path following. Ciarlet PG, Lions JL, eds. Handbook of Numerical Analysis, vol. 5 (Elsevier, Amsterdam), 3–207.Google Scholar
- [2] (2019) How to train your MAML. 7th Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
- [3] (2022) Non-convex bilevel games with critical point selection maps. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2022, Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 1–34.Google Scholar
- [4] (2013) Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Programming 137:91–129.Crossref, Google Scholar
- [5] (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2):438–457.Link, Google Scholar
- [6] (2019) Deep equilibrium models. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- [7] (2018) Qualification conditions in semialgebraic programming. SIAM J. Optim. 28(2):1867–1891.Crossref, Google Scholar
- [8] (2025) Convergence of optimizers implies eigenvalues filtering at equilibrium. Preprint, submitted October 10, https://arxiv.org/abs/2510.09034.Google Scholar
- [9] (2025) Inexact subgradient methods for semialgebraic functions. Math. Programming, Series A.Google Scholar
- [10] (2026) Geometric and computational hardness of bilevel programming. Math. Programming 215(1):539–574.Google Scholar
- [11] (2012) Solving bilevel programs with the KKT-approach. Math. Programming 138:309–332.Crossref, Google Scholar
- [12] (2021) Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 25294–25307.Google Scholar
- [13] (2024) On finding small hyper-gradients in bilevel optimization: Hardness results and improved analysis. Thirty Seventh Annual Conf. Learn. Theory (PMLR, New York), 947–980.Google Scholar
- [14] (2007) An overview of bilevel optimization. Ann. Oper. Res. 153:235–256.Crossref, Google Scholar
- [15] (2000) An Introduction to o-Minimal Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
- [16] (2000) An Introduction to Semialgebraic Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
- [17] (2022) A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
- [18] (2024) A lower bound and a near-optimal algorithm for bilevel empirical risk minimization. Dasgupta S, Mandt S, Li Y, eds. Internat. Conf. Artificial Intelligence Statist. (AISTATS), Proceedings of Machine Learning Research (PMLR, New York), 82–90.Google Scholar
- [19] (2002) Foundations of Bilevel Programming: Noncex Optimization and Its Applications (Springer US, New York).Google Scholar
- [20] Dempe S, Zemkoho A, eds. (2020) Bilevel Optimization: Advances and Next Challenges, Springer Optimization and Its Applications, vol. 161 (Springer, Cham, Switzerland).Crossref, Google Scholar
- [21] (2012) Sensitivity analysis for two-level value functions with applications to bilevel programming. SIAM J. Optim. 22(4):1309–1343.Crossref, Google Scholar
- [22] (2017) Model-agnostic meta-learning for fast adaptation of deep networks. Proc. 34th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 70 (PMLR, New York), 1126–1135.Google Scholar
- [23] (2018) Bilevel programming for hyperparameter optimization and meta-learning. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 1563–1572.Google Scholar
- [24] (2018) Approximation methods for bilevel programming. Preprint, submitted February 6, https://arxiv.org/abs/1802.02246.Google Scholar
- [25] (1992) Automatic differentiation and iterative processes. Optim. Methods Software 1(1):13–21.Crossref, Google Scholar
- [26] (2009) The gradient and heavy ball with friction dynamical systems: The quasiconvex case. Math. Programming 116(1–2):173–191.Crossref, Google Scholar
- [27] (2020) On the iteration complexity of hypergradient computation. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 3748–3758.Google Scholar
- [28] (2008) Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (SIAM, Philadelphia).Crossref, Google Scholar
- [29] (2010) Differential Topology (AMS Chelsea Publishing, Providence, RI).Crossref, Google Scholar
- [30] (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. 2015 IEEE Internat. Conf. Comput. Vision (ICCV) (IEEE Computer Society, Washington, DC), 1026–1034.Google Scholar
- [31] (2011) On calmness conditions in convex bilevel programming. Applicable Anal. 90:951–970.Crossref, Google Scholar
- [32] (2023) Lower bounds and accelerated algorithms for bilevel optimization. J. Machine Learn. Res. 24(22):1–56.Google Scholar
- [33] (2021) Bilevel optimization: Convergence analysis and enhanced design. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research Series, vol. 139 (PMLR, New York), 4882–4892.Google Scholar
- [34] (2022) Theoretical convergence of multi-step model-agnostic meta-learning. J. Machine Learn. Res. 23(29):1–41.Google Scholar
- [35] (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Adv. Neural Inform. Processing Systems 33:11490–11500.Google Scholar
- [36] (2022) Will bilevel optimizers benefit from loops. Adv. Neural Inform. Processing Systems 35:3011–3023.Crossref, Google Scholar
- [37] (1990) Derivative evaluation and computational experience with large bilevel mathematical programs. J. Optim. Theory Appl. 65:485–499.Crossref, Google Scholar
- [38] (2023) On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation. Preprint, submitted September 4, https://arxiv.org/abs/2309.01753.Google Scholar
- [39] (2016) Gradient descent only converges to minimizers. Feldman V, Rakhlin A, Shamir O, eds. 29th Annual Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 49 (PMLR, New York), 1246–1257.Google Scholar
- [40] (2014) On solving simple bilevel programs with a nonconvex lower level program. Math. Programming 144:277–305.Crossref, Google Scholar
- [41] (2019) DARTS: Differentiable architecture search. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
- [42] (2021) Towards gradient-based bilevel optimization with non-convex followers and beyond. Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 8662–8675.Google Scholar
- [43] (2021) A value-function-based interior-point method for non-convex bi-level optimization. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research Series, vol. 139 (PMLR, New York), 6882–6892.Google Scholar
- [44] (2020) A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 6305–6315.Google Scholar
- [45] (2020) Optimizing millions of hyperparameters by implicit differentiation. 23rd Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research (PMLR, New York), 1540–1552.Google Scholar
- [46] (2015) Gradient-based hyperparameter optimization through reversible learning. Proc. 32nd Internat. Conf. Machine Learn. (JMLR.org), 2113–2122.Google Scholar
- [47] (2020) Automatic differentiation of some first-order methods in parametric optimization. 23rd Internat. Conf. Artificial Intelligence and Statistics, Proceedings of Machine Learning Research (PMLR, New York), 1584–1594.Google Scholar
- [48] (2024) Dynamic fista for convex composite bi-level optimization. Preprint, submitted July 30, https://arxiv.org/abs/2407.21221.Google Scholar
- [49] (2020) Bilevel optimization and variational analysis. Dempe S, Zemkoho A, eds. Bilevel Optimization: Advances and Next Challenges (Springer International Publishing, Cham, Switzerland), 197–226.Crossref, Google Scholar
- [50] (2017) Gradient descent only converges to minimizers: Non-isolated critical points and invariant regions. 8th Innovations Theoret. Comput. Sci. Conf. (ITCS 2017) (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Wadern, Germany).Google Scholar
- [51] (1991) Minimization of locally Lipschitzian functions. SIAM J. Optim. 1(1):57–82.Crossref, Google Scholar
- [52] (2016) Hyperparameter optimization with approximate gradient. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 737–746.Google Scholar
- [53] (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2):698–712.Crossref, Google Scholar
- [54] (2019) Meta-learning with implicit gradients. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- [55] (2022) CADDA: Class-wise automatic differentiable data augmentation for EEG signals. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
- [56] (1994) The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15(5):265–272.Crossref, Google Scholar
- [57] (2023) On penalty-based bilevel gradient descent method. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 30992–31015.Google Scholar
- [58] (1987) Global Stability of Dynamical Systems (Springer, New York).Crossref, Google Scholar
- [59] (2011) Asymptotic bias of stochastic gradient search. 2011 50th IEEE Conf. Decision Control Eur. Control Conf. (IEEE, Piscataway, NJ), 722–727.Google Scholar
- [60] (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.Crossref, Google Scholar
- [61] (1994) Bilevel and multilevel programming: A bibliography review. J. Global Optim. 5(3):291–306.Crossref, Google Scholar
- [62] (1995) Optimality conditions for bilevel programming problems. Optimization 33(1):9–27.Crossref, Google Scholar

