Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex–Nonconcave Min-Max Optimization

Published Online:https://doi.org/10.1287/moor.2023.0262

References

  • [1] Abernethy J, Lai KA, Wibisono A (2021) Last-iterate convergence rates for min-max optimization: Convergence of Hamiltonian gradient descent and consensus optimization. Feldman V, Ligett K, Sabato S, eds. Proc. 32nd Internat. Conf. Algorithmic Learn. Theory, Proceedings of Machine Learning Research, vol. 132 (PMLR, New York), 3–47.Google Scholar
  • [2] Adolphs L, Daneshmand H, Lucchi A, Hofmann T (2019) Local saddle point optimization: A curvature exploitation approach. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 486–495.Google Scholar
  • [3] Agarwal N, Allen-Zhu Z, Bullins B, Hazan E, Ma T (2017) Finding approximate local minima faster than gradient descent. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 1195–1199.Google Scholar
  • [4] Allen-Zhu Z (2018) Natasha 2: Faster non-convex optimization than SGD. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 2675–2686.Google Scholar
  • [5] Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Sekhari A, Sridharan K (2020) Second-order information in non-convex stochastic optimization: Power and limitations. Abernethy J, Agarwal S, eds. Proc. Thirty Third Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 125 (PMLR, New York), 242–299.Google Scholar
  • [6] Blum A, Rivest RL (1989) Training a 3-node neural network is NP-complete. Touretzky D, ed. Advances in Neural Information Processing Systems, vol. 1 (Morgan-Kaufmann, Burlington, MA), 494–501.Google Scholar
  • [7] Boucheron S, Lugosi G, Massart P (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • [8] Canetti R (2000) Security and composition of multiparty cryptographic protocols. J. Cryptology 13(1):143–202.CrossrefGoogle Scholar
  • [9] Carmon Y, Duchi JC, Hinder O, Sidford A (2018) Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2):1751–1772.CrossrefGoogle Scholar
  • [10] Curtis FE, Robinson DP (2019) Exploiting negative curvature in deterministic and stochastic optimization. Math. Programming 176(1):69–94.CrossrefGoogle Scholar
  • [11] Danskin JM (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems, vol. 5 (Springer Science & Business Media, Berlin).CrossrefGoogle Scholar
  • [12] Daskalakis C, Panageas I (2018) The limit points of (optimistic) gradient descent in min-max optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 9236–9246.Google Scholar
  • [13] Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1):195–259.CrossrefGoogle Scholar
  • [14] Daskalakis C, Skoulakis S, Zampetakis M (2021) The complexity of constrained min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 1466–1478.Google Scholar
  • [15] Fang C, Li CJ, Lin Z, Zhang T (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 687–697.Google Scholar
  • [16] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
  • [17] Forsgren A, Gill PE, Murray W (1995) Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1):139–150.CrossrefGoogle Scholar
  • [18] Ge R, Huang F, Jin C, Yuan Y (2015) Escaping from saddle points—Online stochastic gradient for tensor decomposition. Grünwald P, Hazan E, Kale S, eds. Proc. 28th Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 40 (PMLR, New York), 797–842.Google Scholar
  • [19] Gidel G, Berard H, Vignoud G, Vincent P, Lacoste-Julien S (2019) A variational inequality perspective on generative adversarial networks. Internat. Conf. Learn. Representations (ICLR 2019) (New Orleans).Google Scholar
  • [20] Goldwasser S, Micali S (1984) Probabilistic encryption. J. Comput. System Sci. 28(2):270–299.CrossrefGoogle Scholar
  • [21] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (MIT Press, Cambridge, MA), 2672–2680.Google Scholar
  • [22] Gould NI, Lucidi S, Roma M, Toint PL (2000) Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Software 14(1–2):75–98.CrossrefGoogle Scholar
  • [23] Guruswami V, Smith A (2010) Codes for computationally simple channels: Explicit constructions with optimal rate. 2010 IEEE 51st Annual Sympos. Foundations Comput. Sci. (Las Vegas), 723–732.Google Scholar
  • [24] Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6626–6637.Google Scholar
  • [25] Jin C, Netrapalli P, Jordan M (2020) What is local optimality in nonconvex-nonconcave minimax optimization? Hal D III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 4880–4889.Google Scholar
  • [26] Jin C, Netrapalli P, Ge R, Kakade SM, Jordan MI (2021) On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. J. ACM 68(2):1–29.CrossrefGoogle Scholar
  • [27] Keswani V, Mangoubi O, Sachdeva S, Vishnoi NK (2022) A convergent and dimension-independent min-max optimization algorithm. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York).Google Scholar
  • [28] Lipton RJ (1994) A new approach to information theory. Enjalbert P, Mayr EW, Wagner KW, eds. Annual Sympos. Theoret. Aspects Comput. Sci. (Springer, Berlin, Heidelberg), 699–708.Google Scholar
  • [29] Liu M, Rafique H, Lin Q, Yang T (2021) First-order convergence theory for weakly-convex-weakly-concave min-max problems. J. Machine Learn. Res. 22(169):1–34.Google Scholar
  • [30] Liu M, Li Z, Wang X, Yi J, Yang T (2018) Adaptive negative curvature descent with applications in non-convex optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 4853–4862.Google Scholar
  • [31] Lu S, Razaviyayn M, Yang B, Huang K, Hong M (2020) Finding second-order stationary points efficiently in smooth nonconvex linearly constrained optimization problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 2811–2822.Google Scholar
  • [32] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
  • [33] Mangoubi O, Vishnoi NK (2021) Greedy adversarial equilibrium: An efficient alternative to nonconvex-nonconcave min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 896–909.Google Scholar
  • [34] Manurangsi P, Reichman D (2018) The computational complexity of training ReLU(s). Preprint, submitted October 9, https://arxiv.org/abs/1810.04207.Google Scholar
  • [35] Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S (2017) Least squares generative adversarial networks. Proc. IEEE Internat. Conf. Comput. Vision, 2794–2802.Google Scholar
  • [36] Micali S, Peikert C, Sudan M, Wilson DA (2005) Optimal error correction against computationally bounded noise. Theory Cryptography Conf. (Springer, Berlin, Heidelberg), 1–16.CrossrefGoogle Scholar
  • [37] Moré JJ, Sorensen DC (1979) On the use of directions of negative curvature in a modified Newton method. Math. Programming 16(1):1–20.CrossrefGoogle Scholar
  • [38] Murty KG, Kabadi SN (1987) Some NP-complete problems in quadratic and nonlinear programming. Math. Programming 39(2):117–129.Google Scholar
  • [39] Nesterov Y, Polyak BT (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108(1):177–205.CrossrefGoogle Scholar
  • [40] Nouiehed M, Sanjabi M, Huang T, Lee JD, Razaviyayn M (2019) Solving a class of non-convex min-max games using iterative first order methods. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 14934–14942.Google Scholar
  • [41] Rafique H, Liu M, Lin Q, Yang T (2022) Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning. Optim. Methods Software 37(3):1087–1121.Google Scholar
  • [42] Reddi S, Zaheer M, Sra S, Poczos B, Bach F, Salakhutdinov R, Smola A (2018) A generic approach for escaping saddle points. Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 84 (PMLR, New York), 1233–1242.Google Scholar
  • [43] Sion M, Wolfe P (1957) On a game without a value. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 299–306.Google Scholar
  • [44] Thekumparampil KK, Jain P, Netrapalli P, Oh S (2019) Efficient algorithms for smooth minimax optimization. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 12680–12691.Google Scholar
  • [45] Wang Y, Zhang G, Ba J (2020) On solving minimax optimization locally: A follow-the-ridge approach. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • [46] Zhang Y, Liang P, Charikar M (2017) A hitting time analysis of stochastic gradient Langevin dynamics. Proc. 2017 Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 65 (PMLR, New York), 1980–2022.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.