Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex–Nonconcave Min-Max Optimization
References
- [1] (2021) Last-iterate convergence rates for min-max optimization: Convergence of Hamiltonian gradient descent and consensus optimization. Feldman V, Ligett K, Sabato S, eds. Proc. 32nd Internat. Conf. Algorithmic Learn. Theory, Proceedings of Machine Learning Research, vol. 132 (PMLR, New York), 3–47.Google Scholar
- [2] (2019) Local saddle point optimization: A curvature exploitation approach. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 486–495.Google Scholar
- [3] (2017) Finding approximate local minima faster than gradient descent. Proc. 49th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 1195–1199.Google Scholar
- [4] (2018) Natasha 2: Faster non-convex optimization than SGD. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 2675–2686.Google Scholar
- [5] (2020) Second-order information in non-convex stochastic optimization: Power and limitations. Abernethy J, Agarwal S, eds. Proc. Thirty Third Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 125 (PMLR, New York), 242–299.Google Scholar
- [6] (1989) Training a 3-node neural network is NP-complete. Touretzky D, ed. Advances in Neural Information Processing Systems, vol. 1 (Morgan-Kaufmann, Burlington, MA), 494–501.Google Scholar
- [7] (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- [8] (2000) Security and composition of multiparty cryptographic protocols. J. Cryptology 13(1):143–202.Crossref, Google Scholar
- [9] (2018) Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2):1751–1772.Crossref, Google Scholar
- [10] (2019) Exploiting negative curvature in deterministic and stochastic optimization. Math. Programming 176(1):69–94.Crossref, Google Scholar
- [11] (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems, vol. 5 (Springer Science & Business Media, Berlin).Crossref, Google Scholar
- [12] (2018) The limit points of (optimistic) gradient descent in min-max optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 9236–9246.Google Scholar
- [13] (2009) The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1):195–259.Crossref, Google Scholar
- [14] (2021) The complexity of constrained min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 1466–1478.Google Scholar
- [15] (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 687–697.Google Scholar
- [16] (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
- [17] (1995) Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1):139–150.Crossref, Google Scholar
- [18] (2015) Escaping from saddle points—Online stochastic gradient for tensor decomposition. Grünwald P, Hazan E, Kale S, eds. Proc. 28th Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 40 (PMLR, New York), 797–842.Google Scholar
- [19] (2019) A variational inequality perspective on generative adversarial networks. Internat. Conf. Learn. Representations (ICLR 2019) (New Orleans).Google Scholar
- [20] (1984) Probabilistic encryption. J. Comput. System Sci. 28(2):270–299.Crossref, Google Scholar
- [21] (2014) Generative adversarial nets. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (MIT Press, Cambridge, MA), 2672–2680.Google Scholar
- [22] (2000) Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Software 14(1–2):75–98.Crossref, Google Scholar
- [23] (2010) Codes for computationally simple channels: Explicit constructions with optimal rate. 2010 IEEE 51st Annual Sympos. Foundations Comput. Sci. (Las Vegas), 723–732.Google Scholar
- [24] (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6626–6637.Google Scholar
- [25] (2020) What is local optimality in nonconvex-nonconcave minimax optimization? Hal D III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 4880–4889.Google Scholar
- [26] (2021) On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. J. ACM 68(2):1–29.Crossref, Google Scholar
- [27] (2022) A convergent and dimension-independent min-max optimization algorithm. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York).Google Scholar
- [28] (1994) A new approach to information theory. Enjalbert P, Mayr EW, Wagner KW, eds. Annual Sympos. Theoret. Aspects Comput. Sci. (Springer, Berlin, Heidelberg), 699–708.Google Scholar
- [29] (2021) First-order convergence theory for weakly-convex-weakly-concave min-max problems. J. Machine Learn. Res. 22(169):1–34.Google Scholar
- [30] (2018) Adaptive negative curvature descent with applications in non-convex optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 4853–4862.Google Scholar
- [31] (2020) Finding second-order stationary points efficiently in smooth nonconvex linearly constrained optimization problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 2811–2822.Google Scholar
- [32] (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
- [33] (2021) Greedy adversarial equilibrium: An efficient alternative to nonconvex-nonconcave min-max optimization. Proc. 53rd Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 896–909.Google Scholar
- [34] (2018) The computational complexity of training ReLU(s). Preprint, submitted October 9, https://arxiv.org/abs/1810.04207.Google Scholar
- [35] (2017) Least squares generative adversarial networks. Proc. IEEE Internat. Conf. Comput. Vision, 2794–2802.Google Scholar
- [36] (2005) Optimal error correction against computationally bounded noise. Theory Cryptography Conf. (Springer, Berlin, Heidelberg), 1–16.Crossref, Google Scholar
- [37] (1979) On the use of directions of negative curvature in a modified Newton method. Math. Programming 16(1):1–20.Crossref, Google Scholar
- [38] (1987) Some NP-complete problems in quadratic and nonlinear programming. Math. Programming 39(2):117–129.Google Scholar
- [39] (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108(1):177–205.Crossref, Google Scholar
- [40] (2019) Solving a class of non-convex min-max games using iterative first order methods. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 14934–14942.Google Scholar
- [41] (2022) Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning. Optim. Methods Software 37(3):1087–1121.Google Scholar
- [42] (2018) A generic approach for escaping saddle points. Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 84 (PMLR, New York), 1233–1242.Google Scholar
- [43] (1957) On a game without a value. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 299–306.Google Scholar
- [44] (2019) Efficient algorithms for smooth minimax optimization. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 12680–12691.Google Scholar
- [45] (2020) On solving minimax optimization locally: A follow-the-ridge approach. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- [46] (2017) A hitting time analysis of stochastic gradient Langevin dynamics. Proc. 2017 Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 65 (PMLR, New York), 1980–2022.Google Scholar

