Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex–Nonconcave Min-Max Optimization

Oren Mangoubi
Oren Mangoubi
[email protected]
https://orcid.org/0000-0002-5228-1313
Worcester Polytechnic Institute, Worcester, Massachusetts 01609
Search for more papers by this author
,
Nisheeth K. Vishnoi
Corresponding Author
Nisheeth K. Vishnoi
[email protected]
https://orcid.org/0000-0002-0255-1119
Yale University, New Haven, Connecticut 06511
Search for more papers by this author

Worcester Polytechnic Institute, Worcester, Massachusetts 01609

Search for more papers by this author

Nisheeth K. Vishnoi

Corresponding Author

Nisheeth K. Vishnoi

[email protected]

https://orcid.org/0000-0002-0255-1119

Yale University, New Haven, Connecticut 06511

Search for more papers by this author

Published Online:22 Dec 2025https://doi.org/10.1287/moor.2023.0262

References

[1] Abernethy J, Lai KA, Wibisono A (2021) Last-iterate convergence rates for min-max optimization: Convergence of Hamiltonian gradient descent and consensus optimization. Feldman V, Ligett K, Sabato S, eds. Proc. 32nd Internat. Conf. Algorithmic Learn. Theory, Proceedings of Machine Learning Research, vol. 132 (PMLR, New York), 3–47.Google Scholar
[2] Adolphs L, Daneshmand H, Lucchi A, Hofmann T (2019) Local saddle point optimization: A curvature exploitation approach. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 486–495.Google Scholar
[3] Agarwal N, Allen-Zhu Z, Bullins B, Hazan E, Ma T (2017) Finding approximate local minima faster than gradient descent. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 1195–1199.Google Scholar
[4] Allen-Zhu Z (2018) Natasha 2: Faster non-convex optimization than SGD. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 2675–2686.Google Scholar
[5] Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Sekhari A, Sridharan K (2020) Second-order information in non-convex stochastic optimization: Power and limitations. Abernethy J, Agarwal S, eds. Proc. Thirty Third Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 125 (PMLR, New York), 242–299.Google Scholar
[6] Blum A, Rivest RL (1989) Training a 3-node neural network is NP-complete. Touretzky D, ed. Advances in Neural Information Processing Systems, vol. 1 (Morgan-Kaufmann, Burlington, MA), 494–501.Google Scholar
[7] Boucheron S, Lugosi G, Massart P (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford University Press, Oxford, UK).Crossref, Google Scholar
[8] Canetti R (2000) Security and composition of multiparty cryptographic protocols. J. Cryptology 13(1):143–202.Crossref, Google Scholar
[9] Carmon Y, Duchi JC, Hinder O, Sidford A (2018) Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2):1751–1772.Crossref, Google Scholar
[10] Curtis FE, Robinson DP (2019) Exploiting negative curvature in deterministic and stochastic optimization. Math. Programming 176(1):69–94.Crossref, Google Scholar
[11] Danskin JM (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems, vol. 5 (Springer Science & Business Media, Berlin).Crossref, Google Scholar
[12] Daskalakis C, Panageas I (2018) The limit points of (optimistic) gradient descent in min-max optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 9236–9246.Google Scholar
[13] Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1):195–259.Crossref, Google Scholar
[14] Daskalakis C, Skoulakis S, Zampetakis M (2021) The complexity of constrained min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 1466–1478.Google Scholar
[15] Fang C, Li CJ, Lin Z, Zhang T (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 687–697.Google Scholar
[16] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
[17] Forsgren A, Gill PE, Murray W (1995) Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1):139–150.Crossref, Google Scholar
[18] Ge R, Huang F, Jin C, Yuan Y (2015) Escaping from saddle points—Online stochastic gradient for tensor decomposition. Grünwald P, Hazan E, Kale S, eds. Proc. 28th Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 40 (PMLR, New York), 797–842.Google Scholar
[19] Gidel G, Berard H, Vignoud G, Vincent P, Lacoste-Julien S (2019) A variational inequality perspective on generative adversarial networks. Internat. Conf. Learn. Representations (ICLR 2019) (New Orleans).Google Scholar
[20] Goldwasser S, Micali S (1984) Probabilistic encryption. J. Comput. System Sci. 28(2):270–299.Crossref, Google Scholar
[21] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (MIT Press, Cambridge, MA), 2672–2680.Google Scholar
[22] Gould NI, Lucidi S, Roma M, Toint PL (2000) Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Software 14(1–2):75–98.Crossref, Google Scholar
[23] Guruswami V, Smith A (2010) Codes for computationally simple channels: Explicit constructions with optimal rate. 2010 IEEE 51st Annual Sympos. Foundations Comput. Sci. (Las Vegas), 723–732.Google Scholar
[24] Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6626–6637.Google Scholar
[25] Jin C, Netrapalli P, Jordan M (2020) What is local optimality in nonconvex-nonconcave minimax optimization? Hal D III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 4880–4889.Google Scholar
[26] Jin C, Netrapalli P, Ge R, Kakade SM, Jordan MI (2021) On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. J. ACM 68(2):1–29.Crossref, Google Scholar
[27] Keswani V, Mangoubi O, Sachdeva S, Vishnoi NK (2022) A convergent and dimension-independent min-max optimization algorithm. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York).Google Scholar
[28] Lipton RJ (1994) A new approach to information theory. Enjalbert P, Mayr EW, Wagner KW, eds. Annual Sympos. Theoret. Aspects Comput. Sci. (Springer, Berlin, Heidelberg), 699–708.Google Scholar
[29] Liu M, Rafique H, Lin Q, Yang T (2021) First-order convergence theory for weakly-convex-weakly-concave min-max problems. J. Machine Learn. Res. 22(169):1–34.Google Scholar
[30] Liu M, Li Z, Wang X, Yi J, Yang T (2018) Adaptive negative curvature descent with applications in non-convex optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 4853–4862.Google Scholar
[31] Lu S, Razaviyayn M, Yang B, Huang K, Hong M (2020) Finding second-order stationary points efficiently in smooth nonconvex linearly constrained optimization problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 2811–2822.Google Scholar
[32] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
[33] Mangoubi O, Vishnoi NK (2021) Greedy adversarial equilibrium: An efficient alternative to nonconvex-nonconcave min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 896–909.Google Scholar
[34] Manurangsi P, Reichman D (2018) The computational complexity of training ReLU(s). Preprint, submitted October 9, https://arxiv.org/abs/1810.04207.Google Scholar
[35] Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S (2017) Least squares generative adversarial networks. Proc. IEEE Internat. Conf. Comput. Vision, 2794–2802.Google Scholar
[36] Micali S, Peikert C, Sudan M, Wilson DA (2005) Optimal error correction against computationally bounded noise. Theory Cryptography Conf. (Springer, Berlin, Heidelberg), 1–16.Crossref, Google Scholar
[37] Moré JJ, Sorensen DC (1979) On the use of directions of negative curvature in a modified Newton method. Math. Programming 16(1):1–20.Crossref, Google Scholar
[38] Murty KG, Kabadi SN (1987) Some NP-complete problems in quadratic and nonlinear programming. Math. Programming 39(2):117–129.Google Scholar
[39] Nesterov Y, Polyak BT (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108(1):177–205.Crossref, Google Scholar
[40] Nouiehed M, Sanjabi M, Huang T, Lee JD, Razaviyayn M (2019) Solving a class of non-convex min-max games using iterative first order methods. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 14934–14942.Google Scholar
[41] Rafique H, Liu M, Lin Q, Yang T (2022) Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning. Optim. Methods Software 37(3):1087–1121.Google Scholar
[42] Reddi S, Zaheer M, Sra S, Poczos B, Bach F, Salakhutdinov R, Smola A (2018) A generic approach for escaping saddle points. Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 84 (PMLR, New York), 1233–1242.Google Scholar
[43] Sion M, Wolfe P (1957) On a game without a value. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 299–306.Google Scholar
[44] Thekumparampil KK, Jain P, Netrapalli P, Oh S (2019) Efficient algorithms for smooth minimax optimization. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 12680–12691.Google Scholar
[45] Wang Y, Zhang G, Ba J (2020) On solving minimax optimization locally: A follow-the-ridge approach. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
[46] Zhang Y, Liang P, Charikar M (2017) A hitting time analysis of stochastic gradient Langevin dynamics. Proc. 2017 Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 65 (PMLR, New York), 1980–2022.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:August 22, 2023
Accepted:October 05, 2025
Published Online:December 22, 2025

Cite as

Oren Mangoubi, Nisheeth K. Vishnoi (2025) Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex–Nonconcave Min-Max Optimization. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0262

Keywords

Acknowledgments

The authors thank the anonymous referees for their insightful comments and suggestions. An extended abstract of this manuscript appeared in Symposium on Theory of Computing 2021 (Mangoubi and Vishnoi [33]). This manuscript includes complete proofs, expanded explanations, and additional supporting results that were omitted in the 14-page conference version because of space limitations.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex–Nonconcave Min-Max Optimization

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News