Bilevel Gradient Methods and the Morse Parametric Qualification Condition

Jérôme Bolte
Jérôme Bolte
[email protected]
https://orcid.org/0000-0002-1676-8407
Toulouse School of Economics, University of Toulouse Capitole, 31000 Toulouse, France
Search for more papers by this author
,
Quoc-Tung Le
Corresponding Author
Quoc-Tung Le
[email protected]
https://orcid.org/0009-0009-9952-2194
Laboratoire Jean Kuntzmann, Centre National de la Recherche Scientifique, L’Institut polytechnique de Grenoble, Université Grenoble Alpes, 38000 Grenoble, France
Search for more papers by this author
,
Edouard Pauwels
Edouard Pauwels
[email protected]
https://orcid.org/0000-0002-8180-075X
Toulouse School of Economics, University of Toulouse Capitole, 31000 Toulouse, France
Search for more papers by this author
,
Samuel Vaiter
Samuel Vaiter
[email protected]
https://orcid.org/0000-0002-4077-708X
Laboratoire J. A. Dieudonné, Centre National de la Recherche Scientifique and Université Côte d’Azur, 06108 Nice, France
Search for more papers by this author

Toulouse School of Economics, University of Toulouse Capitole, 31000 Toulouse, France

Corresponding Author

Quoc-Tung Le

Laboratoire Jean Kuntzmann, Centre National de la Recherche Scientifique, L’Institut polytechnique de Grenoble, Université Grenoble Alpes, 38000 Grenoble, France

Search for more papers by this author

Edouard Pauwels

[email protected]

https://orcid.org/0000-0002-8180-075X

Toulouse School of Economics, University of Toulouse Capitole, 31000 Toulouse, France

Search for more papers by this author

Samuel Vaiter

[email protected]

https://orcid.org/0000-0002-4077-708X

Laboratoire J. A. Dieudonné, Centre National de la Recherche Scientifique and Université Côte d’Azur, 06108 Nice, France

Search for more papers by this author

Published Online:8 Apr 2026https://doi.org/10.1287/moor.2025.0914

References

[1] Allgower EL, Georg K (1997) Numerical path following. Ciarlet PG, Lions JL, eds. Handbook of Numerical Analysis, vol. 5 (Elsevier, Amsterdam), 3–207.Google Scholar
[2] Antoniou A, Edwards H, Storkey A (2019) How to train your MAML. 7th Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
[3] Arbel M, Mairal J (2022) Non-convex bilevel games with critical point selection maps. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2022, Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 1–34.Google Scholar
[4] Attouch H, Bolte J, Svaiter BF (2013) Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Programming 137:91–129.Crossref, Google Scholar
[5] Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2):438–457.Link, Google Scholar
[6] Bai S, Kolter JZ, Koltun V (2019) Deep equilibrium models. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
[7] Bolte J, Hochart A, Pauwels E (2018) Qualification conditions in semialgebraic programming. SIAM J. Optim. 28(2):1867–1891.Crossref, Google Scholar
[8] Bolte J, Le Q-T, Pauwels E (2025) Convergence of optimizers implies eigenvalues filtering at equilibrium. Preprint, submitted October 10, https://arxiv.org/abs/2510.09034.Google Scholar
[9] Bolte J, Le T, Moulines É, Pauwels E (2025) Inexact subgradient methods for semialgebraic functions. Math. Programming, Series A.Google Scholar
[10] Bolte J, Le Q-T, Pauwels E, Vaiter S (2026) Geometric and computational hardness of bilevel programming. Math. Programming 215(1):539–574.Google Scholar
[11] Bouza Allende G, Still G (2012) Solving bilevel programs with the KKT-approach. Math. Programming 138:309–332.Crossref, Google Scholar
[12] Chen T, Sun Y, Yin W (2021) Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 25294–25307.Google Scholar
[13] Chen L, Xu J, Zhang J (2024) On finding small hyper-gradients in bilevel optimization: Hardness results and improved analysis. Thirty Seventh Annual Conf. Learn. Theory (PMLR, New York), 947–980.Google Scholar
[14] Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann. Oper. Res. 153:235–256.Crossref, Google Scholar
[15] Coste M (2000) An Introduction to o-Minimal Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
[16] Coste M (2000) An Introduction to Semialgebraic Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
[17] Dagréou M, Ablin P, Vaiter S, Moreau T (2022) A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
[18] Dagréou M, Moreau T, Vaiter S, Ablin P (2024) A lower bound and a near-optimal algorithm for bilevel empirical risk minimization. Dasgupta S, Mandt S, Li Y, eds. Internat. Conf. Artificial Intelligence Statist. (AISTATS), Proceedings of Machine Learning Research (PMLR, New York), 82–90.Google Scholar
[19] Dempe S (2002) Foundations of Bilevel Programming: Noncex Optimization and Its Applications (Springer US, New York).Google Scholar
[20] Dempe S, Zemkoho A, eds. (2020) Bilevel Optimization: Advances and Next Challenges, Springer Optimization and Its Applications, vol. 161 (Springer, Cham, Switzerland).Crossref, Google Scholar
[21] Dempe S, Mordukhovich BS, Zemkoho AB (2012) Sensitivity analysis for two-level value functions with applications to bilevel programming. SIAM J. Optim. 22(4):1309–1343.Crossref, Google Scholar
[22] Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. Proc. 34th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 70 (PMLR, New York), 1126–1135.Google Scholar
[23] Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel programming for hyperparameter optimization and meta-learning. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 1563–1572.Google Scholar
[24] Ghadimi S, Wang M (2018) Approximation methods for bilevel programming. Preprint, submitted February 6, https://arxiv.org/abs/1802.02246.Google Scholar
[25] Gilbert JC (1992) Automatic differentiation and iterative processes. Optim. Methods Software 1(1):13–21.Crossref, Google Scholar
[26] Goudou X, Munier J (2009) The gradient and heavy ball with friction dynamical systems: The quasiconvex case. Math. Programming 116(1–2):173–191.Crossref, Google Scholar
[27] Grazzi R, Franceschi L, Pontil M, Salzo S (2020) On the iteration complexity of hypergradient computation. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 3748–3758.Google Scholar
[28] Griewank A, Walther A (2008) Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (SIAM, Philadelphia).Crossref, Google Scholar
[29] Guillemin V, Pollack A (2010) Differential Topology (AMS Chelsea Publishing, Providence, RI).Crossref, Google Scholar
[30] He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. 2015 IEEE Internat. Conf. Comput. Vision (ICCV) (IEEE Computer Society, Washington, DC), 1026–1034.Google Scholar
[31] Henrion R, Surowiec T (2011) On calmness conditions in convex bilevel programming. Applicable Anal. 90:951–970.Crossref, Google Scholar
[32] Ji K, Liang Y (2023) Lower bounds and accelerated algorithms for bilevel optimization. J. Machine Learn. Res. 24(22):1–56.Google Scholar
[33] Ji K, Yang J, Liang Y (2021) Bilevel optimization: Convergence analysis and enhanced design. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research Series, vol. 139 (PMLR, New York), 4882–4892.Google Scholar
[34] Ji K, Yang J, Liang Y (2022) Theoretical convergence of multi-step model-agnostic meta-learning. J. Machine Learn. Res. 23(29):1–41.Google Scholar
[35] Ji K, Lee JD, Liang Y, Poor HV (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Adv. Neural Inform. Processing Systems 33:11490–11500.Google Scholar
[36] Ji K, Liu M, Liang Y, Ying L (2022) Will bilevel optimizers benefit from loops. Adv. Neural Inform. Processing Systems 35:3011–3023.Crossref, Google Scholar
[37] Kolstad CD, Lasdon LS (1990) Derivative evaluation and computational experience with large bilevel mathematical programs. J. Optim. Theory Appl. 65:485–499.Crossref, Google Scholar
[38] Kwon J, Kwon D, Wright S, Nowak R (2023) On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation. Preprint, submitted September 4, https://arxiv.org/abs/2309.01753.Google Scholar
[39] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent only converges to minimizers. Feldman V, Rakhlin A, Shamir O, eds. 29th Annual Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 49 (PMLR, New York), 1246–1257.Google Scholar
[40] Lin G-H, Xu M, Ye JJ (2014) On solving simple bilevel programs with a nonconvex lower level program. Math. Programming 144:277–305.Crossref, Google Scholar
[41] Liu H, Simonyan K, Yang Y (2019) DARTS: Differentiable architecture search. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
[42] Liu R, Liu Y, Zeng S, Zhang J (2021) Towards gradient-based bilevel optimization with non-convex followers and beyond. Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J, eds. Annual Conf. Neural Inform. Processing Systems, NeurIPS 2021, Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 8662–8675.Google Scholar
[43] Liu R, Liu X, Yuan X, Zeng S, Zhang J (2021) A value-function-based interior-point method for non-convex bi-level optimization. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research Series, vol. 139 (PMLR, New York), 6882–6892.Google Scholar
[44] Liu R, Mu P, Yuan X, Zeng S, Zhang J (2020) A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research (PMLR, New York), 6305–6315.Google Scholar
[45] Lorraine J, Vicol P, Duvenaud D (2020) Optimizing millions of hyperparameters by implicit differentiation. 23rd Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research (PMLR, New York), 1540–1552.Google Scholar
[46] Maclaurin D, Duvenaud D, Adams R (2015) Gradient-based hyperparameter optimization through reversible learning. Proc. 32nd Internat. Conf. Machine Learn. (JMLR.org), 2113–2122.Google Scholar
[47] Mehmood S, Ochs P (2020) Automatic differentiation of some first-order methods in parametric optimization. 23rd Internat. Conf. Artificial Intelligence and Statistics, Proceedings of Machine Learning Research (PMLR, New York), 1584–1594.Google Scholar
[48] Merchav R, Sabach S, Teboulle M (2024) Dynamic fista for convex composite bi-level optimization. Preprint, submitted July 30, https://arxiv.org/abs/2407.21221.Google Scholar
[49] Mordukhovich BS (2020) Bilevel optimization and variational analysis. Dempe S, Zemkoho A, eds. Bilevel Optimization: Advances and Next Challenges (Springer International Publishing, Cham, Switzerland), 197–226.Crossref, Google Scholar
[50] Panageas I, Piliouras G (2017) Gradient descent only converges to minimizers: Non-isolated critical points and invariant regions. 8th Innovations Theoret. Comput. Sci. Conf. (ITCS 2017) (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Wadern, Germany).Google Scholar
[51] Pang J-S, Han S-P, Rangaraj N (1991) Minimization of locally Lipschitzian functions. SIAM J. Optim. 1(1):57–82.Crossref, Google Scholar
[52] Pedregosa F (2016) Hyperparameter optimization with approximate gradient. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 737–746.Google Scholar
[53] Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2):698–712.Crossref, Google Scholar
[54] Rajeswaran A, Finn C, Kakade SM, Levine S (2019) Meta-learning with implicit gradients. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
[55] Rommel C, Moreau T, Paillard J, Gramfort A (2022) CADDA: Class-wise automatic differentiable data augmentation for EEG signals. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
[56] Savard G, Gauvin J (1994) The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15(5):265–272.Crossref, Google Scholar
[57] Shen H, Chen T (2023) On penalty-based bilevel gradient descent method. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 30992–31015.Google Scholar
[58] Shub M, Fathi A, Langevin R (1987) Global Stability of Dynamical Systems (Springer, New York).Crossref, Google Scholar
[59] Tadić VB, Doucet A (2011) Asymptotic bias of stochastic gradient search. 2011 50th IEEE Conf. Decision Control Eur. Control Conf. (IEEE, Piscataway, NJ), 722–727.Google Scholar
[60] van den Dries L, Miller C (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.Crossref, Google Scholar
[61] Vicente LN, Calamai PH (1994) Bilevel and multilevel programming: A bibliography review. J. Global Optim. 5(3):291–306.Crossref, Google Scholar
[62] Ye JJ, Zhu DL (1995) Optimality conditions for bilevel programming problems. Optimization 33(1):9–27.Crossref, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:February 12, 2025
Accepted:February 08, 2026
Published Online:April 08, 2026

Cite as

Jérôme Bolte, Quoc-Tung Le, Edouard Pauwels, Samuel Vaiter (2026) Bilevel Gradient Methods and the Morse Parametric Qualification Condition. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2025.0914

Keywords

Acknowledgments

The authors thank S. Dempe and A. Zemkoho for their useful comments. Jérôme Bolte and Edouard Pauwels thank la Fondation Toulouse School of Economics Partenariats (TSE-P).

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bilevel Gradient Methods and the Morse Parametric Qualification Condition

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News