We consider the problem of training a shallow neural network with quadratic activation functions and the generalization power of such trained networks. Assuming that the samples are generated by a full rank matrix $W^{*}$ of the hidden network node weights, we obtain the following results. We establish that all full-rank approximately stationary solutions of the risk minimization problem are also approximate global optimums of the risk (in-sample and population). As a consequence, we establish that, when trained on polynomially many samples, the gradient descent algorithm converges to the global optimum of the risk minimization problem regardless of the width of the network when it is initialized at some value $ν^{*}$ , which we compute. Furthermore, the network produced by the gradient descent has a near zero generalization error. Next, we establish that initializing the gradient descent algorithm below $ν^{*}$ is easily achieved when the weights of the ground truth matrix $W^{*}$ are randomly generated and the matrix is sufficiently overparameterized. Finally, we identify a simple necessary and sufficient geometric condition on the size of the training set under which any global minimizer of the empirical risk has necessarily zero generalization error.

Funding: The research of E. C. Kizildag is supported by Columbia University, with the Distinguished Postdoctoral Fellowship in Statistics. Support from the National Science Foundation [Grant DMS-2015517] is gratefully acknowledged.

cover image Mathematics of Operations Research

Volume 50, Issue 1

February 2025

Pages 1-781 C2

Article Information

Metrics

Information

Received:March 30, 2021
Accepted:December 10, 2023
Published Online:February 20, 2024

Cite as

David Gamarnik; , Eren C. Kızıldağ; , Ilias Zadik (2024) Stationary Points of a Shallow Neural Network with Quadratic Activations and the Global Optimality of the Gradient Descent Algorithm. Mathematics of Operations Research 50(1):209-251.

https://doi.org/10.1287/moor.2021.0082

Keywords

Acknowledgments

The authors thank the anonymous reviewers for their very detailed feedback, which improved the presentation of this paper, and Orestis Plevrakis for providing useful remarks on the initial version of this paper. Part of this work was done when D. Gamarnik and E. C. Kızıldağ were visiting the Simons Institute for the Theory of Computing at the University of California, Berkeley in Fall 2020.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Stationary Points of a Shallow Neural Network with Quadratic Activations and the Global Optimality of the Gradient Descent Algorithm

Abstract

Volume 50, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News