Solving Bilevel Optimization via Sequential Minimax Optimization

Published Online:https://doi.org/10.1287/moor.2024.0521

References

  • [1] Allende GB, Still G (2013) Solving bilevel programs with the KKT-approach. Math. Programming 138(1):309–332.CrossrefGoogle Scholar
  • [2] Bard JF (2013) Practical Bilevel Optimization: Algorithms and Applications, vol. 30 (Springer Science & Business Media, New York).Google Scholar
  • [3] Bennett KP, Kunapuli G, Hu J, Pang J-S (2008) Bilevel optimization and machine learning. IEEE World Congress Comput. Intelligence (Springer, Berlin), 25–47.Google Scholar
  • [4] Bertinetto L, Henriques JF, Torr P, Vedaldi A (2018) Meta-learning with differentiable closed-form solvers. Internat. Conf. Learning Representations (OpenReview, Alameda, CA).Google Scholar
  • [5] Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):1–27.CrossrefGoogle Scholar
  • [6] Chen L, Xu J, Zhang J (2023) Bilevel optimization without lower-level strong convexity from the hyper-objective perspective. Preprint, submitted January 2, https://arxiv.org/abs/2301.00712.Google Scholar
  • [7] Chen T, Sun Y, Xiao Q, Yin W (2022) A single-timescale method for stochastic bilevel optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2466–2488.Google Scholar
  • [8] Clarke FH (1990) Optimization and Nonsmooth Analysis (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [9] Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann. Oper. Res. 153(1):235–256.CrossrefGoogle Scholar
  • [10] Crockett C, Fessler JA (2022) Bilevel methods for image reconstruction. Foundations Trends Signal Processing 15(2–3):121–289.CrossrefGoogle Scholar
  • [11] Dai Y-H, Zhang L (2020) Optimality conditions for constrained minimax optimization. Preprint, submitted April 21, https://arxiv.org/abs/2004.09730.Google Scholar
  • [12] Dai Y-H, Wang J, Zhang L (2024) Optimality conditions and numerical algorithms for a class of linearly constrained minimax optimization problems. SIAM J. Optim. 34(3):2883–2916.CrossrefGoogle Scholar
  • [13] Dempe S (2002) Foundations of Bilevel Programming (Springer Science & Business Media, New York).Google Scholar
  • [14] Dempe S, Zemkoho A (2013) The bilevel programming problem: Reformulations, constraint qualifications and optimality conditions. Math. Programming 138(1):447–473.CrossrefGoogle Scholar
  • [15] Dempe S, Zemkoho A (2020) Bilevel optimization. Springer Optimization and Its Applications, vol. 161 (Springer, Berlin).Google Scholar
  • [16] Dempe S, Kalashnikov V, Pérez-Valdés GA, Kalashnykova N (2015) Bilevel programming problems. Energy Systems, vol. 10 (Springer, Berlin).Google Scholar
  • [17] Feurer M, Hutter F (2019) Hyperparameter optimization. Automated Machine Learning (Springer, Cham, Switzerland), 3–33.CrossrefGoogle Scholar
  • [18] Franceschi L, Donini M, Frasconi P, Pontil M (2017) Forward and reverse gradient-based hyperparameter optimization. Internat. Conf. Machine Learn. (PMLR, New York), 1165–1173.Google Scholar
  • [19] Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel programming for hyperparameter optimization and meta-learning. Internat. Conf. Machine Learn. (PMLR, New York), 1568–1577.Google Scholar
  • [20] Grant M, Boyd S (2014) CVX: Matlab software for disciplined convex programming, version 2.1, https://cvxr.com/cvx/citing/.Google Scholar
  • [21] Grazzi R, Franceschi L, Pontil M, Salzo S (2020) On the iteration complexity of hypergradient computation. Internat. Conf. Machine Learn. (PMLR, New York), 3748–3758.Google Scholar
  • [22] Guo Z, Hu Q, Zhang L, Yang T (2021) Randomized stochastic variance-reduced methods for multi-task stochastic bilevel optimization. Preprint, submitted May 5, https://arxiv.org/abs/2105.02266.Google Scholar
  • [23] Hansen P, Jaumard B, Savard G (1992) New branch-and-bound rules for linear bilevel programming. SIAM J. Sci. Statist. Comput. 13(5):1194–1217.CrossrefGoogle Scholar
  • [24] Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2 (Springer, Berlin).CrossrefGoogle Scholar
  • [25] Hong M, Wai H-T, Wang Z, Yang Z (2023) A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM J. Optim. 33(1):147–180.CrossrefGoogle Scholar
  • [26] Hu X, Xiao N, Liu X, Toh K-C (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.CrossrefGoogle Scholar
  • [27] Huang F, Li J, Gao S (2021) BiAdam: Fast adaptive bilevel optimization methods. Preprint, submitted June 21, https://arxiv.org/abs/2106.11396.Google Scholar
  • [28] Huang M, Chen X, Ji K, Ma S, Lai L (2025) Efficiently escaping saddle points in bilevel optimization. J. Machine Learn. Res. 26(1):1–61.Google Scholar
  • [29] Ishizuka Y, Aiyoshi E (1992) Double penalty method for bilevel optimization problems. Ann. Oper. Res. 34(1):73–88.CrossrefGoogle Scholar
  • [30] Ji K, Yang J, Liang Y (2021) Bilevel optimization: Convergence analysis and enhanced design. Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 4882–4892.Google Scholar
  • [31] Ji K, Lee JD, Liang Y, Poor HV (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 11490–11500.Google Scholar
  • [32] Khanduri P, Zeng S, Hong M, Wai H-T, Wang Z, Yang Z (2021) A near-optimal algorithm for stochastic bilevel optimization via double-momentum. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 30271–30283.Google Scholar
  • [33] Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Adv. Neural Inform. Processing Systems, vol. 12 (MIT Press, Cambridge, MA), 1008–1014.Google Scholar
  • [34] Kong W, Monteiro RD (2021) An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM J. Optim. 31(4):2558–2585.CrossrefGoogle Scholar
  • [35] Kovalev D, Gasnikov A (2022) The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 14691–14703.Google Scholar
  • [36] Kwon J, Kwon D, Wright S, Nowak RD (2023) A fully first-order method for stochastic bilevel optimization. Internat. Conf. Machine Learn. (PMLR, New York), 18083–18113.Google Scholar
  • [37] Li J, Gu B, Huang H (2022) A fully single loop algorithm for bilevel optimization without hessian inverse. Proc. AAAI Conf. Artificial Intelligence 36(7):7426–7434.Google Scholar
  • [38] Li Y, Lin G-H, Zhang J, Zhu X (2023) A novel approach for bilevel programs based on Wolfe duality. Preprint, submitted February 14, https://arxiv.org/abs/2302.06838.Google Scholar
  • [39] Lin Q, Lu Z, Xiao L (2015) An accelerated randomized proximal coordinate gradient method and its application to regularized empirical risk minimization. SIAM J. Optim. 25(4):2244–2273.CrossrefGoogle Scholar
  • [40] Liu H, Simonyan K, Yang Y (2018) Darts: Differentiable architecture search. Internat. Conf. Learn. Representations, vol. 176 (Elsevier, Amsterdam), 42–48.Google Scholar
  • [41] Liu R, Gao J, Zhang J, Meng D, Lin Z (2021) Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE Trans. Pattern Anal. Machine Intelligence 44(12):10045–10067.CrossrefGoogle Scholar
  • [42] Liu B, Ye M, Wright S, Stone P, Liu Q (2022) Bome! Bilevel optimization made easy: A simple first-order approach. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 17248–17262.Google Scholar
  • [43] Lopez-Paz D, Ranzato M (2017) Gradient episodic memory for continual learning. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 6470–6479.Google Scholar
  • [44] Lu Z, Mei S (2024) A first-order method for nonconvex-strongly-concave constrained minimax optimization. Preprint, submitted January 5, 2023, https://arxiv.org/abs/2301.02060.Google Scholar
  • [45] Lu Z, Mei S (2024) First-order penalty methods for bilevel optimization. SIAM J. Optim. 34(2):1937–1969.CrossrefGoogle Scholar
  • [46] Lu Z, Mei S (2025) A first-order augmented Lagrangian method for constrained minimax optimization. Math. Programming 213:1063–1104.CrossrefGoogle Scholar
  • [47] Lu Z, Zhou Z (2023) Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming. SIAM J. Optim. 33(2):1159–1190.CrossrefGoogle Scholar
  • [48] Luo Z-Q, Pang J-S, Ralph D (1996) Mathematical Programs with Equilibrium Constraints (Cambridge University Press, Cambridge, UK). CrossrefGoogle Scholar
  • [49] Ma X, Yao W, Ye JJ, Zhang J (2021) Combined approach with second-order optimality conditions for bilevel programming problems. Preprint, submitted July 31, https://arxiv.org/abs/2108.00179.Google Scholar
  • [50] Maclaurin D, Duvenaud D, Adams R (2015) Gradient-based hyperparameter optimization through reversible learning. Internat. Conf. Machine Learn. (PMLR, New York), 2113–2122.Google Scholar
  • [51] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (MIT Press, Cambridge, MA).Google Scholar
  • [52] Mirrlees JA (1999) The theory of moral hazard and unobservable behaviour: Part I. Rev. Econom. Stud. 66(1):3–21.CrossrefGoogle Scholar
  • [53] Nesterov YE (2004) Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, MA).CrossrefGoogle Scholar
  • [54] Okuno T, Takeda A, Kawana A, Watanabe M (2021) On ℓp-hyperparameter learning via bilevel nonsmooth optimization. J. Machine Learn. Res. 22(1):11093–11139.Google Scholar
  • [55] Outrata J, Kocvara M, Zowe J (1998) Nonsmooth Approach to Optimization Problems with Equilibrium Constraints: Theory, Applications and Numerical Results, vol. 28 (Springer Science & Business Media, New York).CrossrefGoogle Scholar
  • [56] Pedregosa F (2016) Hyperparameter optimization with approximate gradient. Internat. Conf. Machine Learn. (PMLR, New York), 737–746.Google Scholar
  • [57] Rajeswaran A, Finn C, Kakade SM, Levine S (2019) Meta-learning with implicit gradients. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 113–124.Google Scholar
  • [58] Shen H, Chen T (2023) On penalty-based bilevel gradient descent method. Internat. Conf. Machine Learn. (PMLR, New York), 30992–31015.Google Scholar
  • [59] Shen X, Yu L (2013) CU splitting early termination based on weighted SVM. EURASIP J. Image Video Processing 2013(1):4.CrossrefGoogle Scholar
  • [60] Shi C, Lu J, Zhang G (2005) An extended Kuhn–Tucker approach for linear bilevel programming. Appl. Math. Comput. 162(1):51–63.CrossrefGoogle Scholar
  • [61] Shimizu K, Ishizuka Y, Bard JF (2012) Nondifferentiable and Two-Level Mathematical Programming (Springer Science & Business Media, New York).Google Scholar
  • [62] Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Trans. Evolutionary Comput. 22(2):276–295.CrossrefGoogle Scholar
  • [63] Sow D, Ji K, Guan Z, Liang Y (2022) A primal-dual approach to bilevel optimization with multiple inner minima. Preprint, submitted March 1, https://arxiv.org/abs/2203.01123.Google Scholar
  • [64] Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. Preprint, submitted December 21, https://arxiv.org/abs/1312.6199.Google Scholar
  • [65] Tseng P (2008) On Accelerated Proximal Gradient Methods for Convex-Concave Optimization (MIT Press, Cambridge, MA), 1–20.Google Scholar
  • [66] Vicente LN, Calamai PH (1994) Bilevel and multilevel programming: A bibliography review. J. Global Optim. 5(3):291–306.CrossrefGoogle Scholar
  • [67] Von Stackelberg H (2010) Market Structure and Equilibrium (Springer Science & Business Media, New York).Google Scholar
  • [68] Wang X, Pan R, Pi R, Zhang T (2023) Effective bilevel optimization via minimax reformulation. Preprint, submitted May 22, https://arxiv.org/abs/2305.13153.Google Scholar
  • [69] Ward D, Borwein JM (1987) Nonsmooth calculus in finite dimensions. SIAM J. Control Optim. 25(5):1312–1340.CrossrefGoogle Scholar
  • [70] Xanthopoulos P, Razzaghi T (2014) A weighted support vector machine method for control chart pattern recognition. Computers Indust. Engrg. 70:134–149.CrossrefGoogle Scholar
  • [71] Yang J, Ji K, Liang Y (2021) Provably faster algorithms for bilevel optimization. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 13670–13682.Google Scholar
  • [72] Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Internat. J. Pattern Recognition Artificial Intellience 21(5):961–976.CrossrefGoogle Scholar
  • [73] Yao W, Yin H, Zeng S, Zhang J (2024) Overcoming lower-level constraints in bilevel optimization: A novel approach with regularized gap functions. Preprint, submitted June 4, https://arxiv.org/abs/2406.01992.Google Scholar
  • [74] Yao W, Yu C, Zeng S, Zhang J (2024) Constrained bi-level optimization: Proximal Lagrangian value function approach and Hessian-free algorithm. Preprint, submitted January 29, https://arxiv.org/abs/2401.16164.Google Scholar
  • [75] Ye JJ (2020) Constraint qualifications and optimality conditions in bilevel optimization. Bilevel Optimization (Springer, Berlin), 227–251.CrossrefGoogle Scholar
  • [76] Ye JJ, Yuan X, Zeng S, Zhang J (2023) Difference of convex algorithms for bilevel programs with applications in hyperparameter selection. Math. Programming 198(2):1583–1616.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.