Mean Field Analysis of Deep Neural Networks

Justin Sirignano
Corresponding Author
Justin Sirignano
[email protected]
https://orcid.org/0000-0002-0971-1349
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author
,
Konstantinos Spiliopoulos
Konstantinos Spiliopoulos
[email protected]
Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215
Search for more papers by this author

Justin Sirignano

Corresponding Author

Justin Sirignano

[email protected]

https://orcid.org/0000-0002-0971-1349

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Konstantinos Spiliopoulos

[email protected]

Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215

Search for more papers by this author

Published Online:21 Apr 2021https://doi.org/10.1287/moor.2020.1118

References

[1] Alipanahi B, Delong A, Weirauch M, Frey B (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotech. 33(8):831–838.Crossref, Google Scholar
[2] Ambrosio L, Gigli N, Savaré G (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures (Springer Science & Business Media, New York).Google Scholar
[3] Araújo D, Oliveira RI, Yukimura D (2019) A mean-field limit for certain deep neural networks. Preprint, submitted June 1, https://arxiv.org/abs/1906.00193.Google Scholar
[4] Arik S, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, et al.. (2017) Deep voice: Real-time neural text-to-speech. Preprint, submitted February 25, https://arxiv.org/abs/1702.07825.Google Scholar
[5] Barron A (1994) Approximation and estimation bounds for artificial neural networks. Machine Learn. 14(1):115–133.Crossref, Google Scholar
[6] Blanchet A, Bolte J (2016) A family of functional inequalities, Łojasiewicz inequalities and displacement convex functions. Preprint, submitted December 8, https://arxiv.org/abs/1612.02619.Google Scholar
[7] Bojarski M, Del Test D, Dworakowski D, Firnier B, Flepp B, Goyal P, Jackel L, et al.. (2016) End to end learning for self-driving cars. Preprint, submitted April 25, https://arxiv.org/abs/1604.07316.Google Scholar
[8] Chizat L (2019) Sparse optimization on measures with over-parameterized gradient descent. Preprint, submitted July 24, https://arxiv.org/abs/1907.10300.Google Scholar
[9] Chizat L, Bach F (2018) A note on lazy training in supervised differentiable programming. Preprint, submitted December 19, https://arxiv.org/abs/1812.07956v1.Google Scholar
[10] Chizat L, Bach F (2018) On the global convergence of gradient descent for over-parameterized models using optimal transport. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 3040–3050.Google Scholar
[11] Cybenko G (1989) Approximation by superposition of a sigmoidal function. Math. Control Signals Systems 2(4):303–314.Crossref, Google Scholar
[12] Dawson DA (1997) Hierarchical and mean-field stepping stone models. Donnelly P, Tavaré S, eds. Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and Its Applications, vol. 87 (Springer, New York), 287–298.Crossref, Google Scholar
[13] Dawson DA (2018) Multilevel mutation-selection systems and set-valued duals. J. Math. Biol. 76(1–2):295–378.Crossref, Google Scholar
[14] Dawson DA, Gärtner J (1998) Analytic aspects of multilevel large deviations. Szyszkowicz B, ed. Asymptotic Methods in Probability and Statistics (Elsevier, Amsterdam), 401–440.Crossref, Google Scholar
[15] Dawson DA, Greven A (1993) Hierarchically interacting Fleming-Viot processes with selection and mutation: Multiple space-time scale analysis and quasi equilibria. Electronic J. Probab. 4:Paper 4.Google Scholar
[16] Dawson DA, Hochberg KJ (1982) Wandering random measures in the Fleming-Viot model. Ann. Probab. 10(3):554–580.Crossref, Google Scholar
[17] Dawson DA, Hochberg KJ (1991) A multilevel branching model. Adv. Appl. Probab. 23(4):701–715.Crossref, Google Scholar
[18] Dawson DA, Wu Y (1996) Multilevel multitype models of an information system. Athreya KB, Jagers P, eds. Classical and Modern Branching Processes, IMA Volumes in Mathematics and Its Operations, vol. 84 (Springer, New York), 57–72.Google Scholar
[19] Dawson DA, Hochberg KJ, Vinogradov V (1996) High-density limits of hierarchically structured branching-diffusing populations. Stochastic Process. Appl. 62(2):191–222.Crossref, Google Scholar
[20] Du SS, Lee JD, Li H, Wang L, Zhai X (2018) Gradient descent finds global minima of deep neural networks. Preprint, submitted November 9, https://arxiv.org/abs/1811.03804.Google Scholar
[21] Du SS, Zhai X, Poczos B, Singh A (2018) Gradient descent provably optimizes over-parameterized neural networks. Preprint, submitted October 4, https://arxiv.org/abs/1810.02054.Google Scholar
[22] Ethier S, Kurtz T (1986) Markov Processes: Characterization and Convergence (John Wiley & Sons, New York).Crossref, Google Scholar
[23] Funahashi K-I (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3):183–192.Crossref, Google Scholar
[24] Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
[25] Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE Conf. Robotics Automation (IEEE, Piscataway, NJ), 3389–3396.Google Scholar
[26] Hauer D, Mazón J (2019) Kurdyka-Łojasiewicz-Simon inequality for gradient flows in metric spaces. Trans. Amer. Math. Soc. 372(7):4917–4976.Crossref, Google Scholar
[27] Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.Crossref, Google Scholar
[28] Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.Crossref, Google Scholar
[29] Hu K, Ren Z, Šiška D, Szpruch L (2019) Mean-field Langevin dynamics and energy landscape of neural networks. Preprint, submitted May 19, https://arxiv.org/abs/1905.07769.Google Scholar
[30] Jacot A, Gabriel F, Hongler C (2018) Neural tangent kernel: Convergence and generalization in neural networks. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 8571–8580.Google Scholar
[31] Kolokoltsov VN (2010) Nonlinear Markov Processes and Kinetic Equations, vol. 182 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[32] Krantz SG, Parks HR (2002) A Primer of Real Analytic Functions, 2nd ed. (Springer Science and Business Media, New York).Crossref, Google Scholar
[33] Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Canada.Google Scholar
[34] LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.Crossref, Google Scholar
[35] Leviathan Y, Matias Y (2018) Google Duplex: An AI system for accomplishing real-world tasks over the phone. Google AI Blog (May 8), https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html.Google Scholar
[36] Li D, Ding T, Sun R (2018) Over-parameterized deep neural networks have no strict local minima for any continuous activations. Preprint, December 28, https://arxiv.org/abs/1812.11039v1.Google Scholar
[37] Ling J, Jones R, Templeton J (2016) Machine learning strategies for systems with invariance properties. J. Comput. Phys. 318(August):22–35.Crossref, Google Scholar
[38] Ling J, Kurzawski A, Templeton J (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807:155–166.Crossref, Google Scholar
[39] Liang S, Sun R, Lee J, Srikant R (2018) Adding one neuron can eliminate all bad local minima. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing, vol. 31 (Curran Associates, Red Hook, NY), 4350–4360.Google Scholar
[40] Liang S, Sun R, Li Y, Srikant R (2018) Understanding the loss surface of neural networks for binary classification. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., vol. 35 (PMLR), 2835–2843.Google Scholar
[41] Mallat S (2016) Understanding deep convolutional neural networks. Philos. Trans. Roy. Soc. A 374(2065):20150203.Crossref, Google Scholar
[42] Mei S, Misiakiewicz T, Montanari A (2019) A mean field theory of two-layers neural networks: dimension free bounds and kernel limit. Preprint, February 16, https://arxiv.org/abs/1902.06015.Google Scholar
[43] Mei S, Montanari A, Nguyen P (2018) A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. 115(33);E7665–E7671.Crossref, Google Scholar
[44] Nguyen P-M (2019) Mean field limit of the learning dynamics of multilayer neural networks. Preprint, submitted February 7, https://arxiv.org/abs/1902.02880.Google Scholar
[45] Pierson H, Gashler M (2017) Deep learning in robotics: A review of recent research. Adv. Robotics 31(16):821–835.Crossref, Google Scholar
[46] Rotskoff GM, Vanden-Eijnden E (2018) Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. Preprint, submitted May 2, https://arxiv.org/abs/1805.00915v1.Google Scholar
[47] Sirignano J, Cont R (2018) Universal features of price formation in financial markets: Perspectives from Deep Learning. Preprint, submitted March 19, https://arxiv.org/abs/1803.06917.Google Scholar
[48] Sirignano J, Spiliopoulos K (2017) Stochastic gradient descent in continuous time. SIAM J. Financial Math. 8(1):933–961.Crossref, Google Scholar
[49] Sirignano J, Spiliopoulos K (2018) DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375(December):1339–1364.Crossref, Google Scholar
[50] Sirignano J, Spiliopoulos K (2020) Mean field analysis of neural networks: A central limit theorem. Stochastic Process. Appl. 130(3):1820–1852.Crossref, Google Scholar
[51] Sirignano J, Spiliopoulos K (2020) Mean field analysis of neural networks: A law of large numbers. SIAM J. Appl. Math. 80(2):725–752.Crossref, Google Scholar
[52] Sirignano J, Sadhwani A, Giesecke K (2016) Deep learning for mortgage risk. Preprint, submitted July 8, https://arxiv.org/abs/1607.02470.Google Scholar
[53] Sünderhauf N, Brock O, Cheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, et al. (2018) The limits and potentials of deep learning for robotics. Internat. J. Robotics Res. 37(4–5):405–420.Crossref, Google Scholar
[54] Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks.Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 3104–3112.Google Scholar
[55] Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proc. IEEE Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 1701–1708.Google Scholar
[56] Wu Y, Schuster M, Chen Z, Le Q, Norouzi M, Macherey W, Krikun M, et al.. (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. Preprint, submitted September 26, https://arxiv.org/abs/1609.08144.Google Scholar
[57] Zhang Y, Chan W, Jaitly N (2017) Very deep convolutional networks for end-to-end speech recognition. IEEE Internat. Conf. Acoustics, Speech, Signal Processing Proc. (IEEE, Piscataway, NJ), 4845–4849.Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 1

February 2022

Pages 1-846, C2

Article Information

Metrics

Information

Received:June 10, 2019
Accepted:September 24, 2020
Published Online:April 21, 2021

Cite as

Justin Sirignano, Konstantinos Spiliopoulos (2021) Mean Field Analysis of Deep Neural Networks. Mathematics of Operations Research 47(1):120-152.

https://doi.org/10.1287/moor.2020.1118

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Mean Field Analysis of Deep Neural Networks

References

Volume 47, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News