Mean Field Analysis of Deep Neural Networks

Published Online:https://doi.org/10.1287/moor.2020.1118

References

  • [1] Alipanahi B, Delong A, Weirauch M, Frey B (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotech. 33(8):831–838.CrossrefGoogle Scholar
  • [2] Ambrosio L, Gigli N, Savaré G (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures (Springer Science & Business Media, New York).Google Scholar
  • [3] Araújo D, Oliveira RI, Yukimura D (2019) A mean-field limit for certain deep neural networks. Preprint, submitted June 1, https://arxiv.org/abs/1906.00193.Google Scholar
  • [4] Arik S, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, et al.. (2017) Deep voice: Real-time neural text-to-speech. Preprint, submitted February 25, https://arxiv.org/abs/1702.07825.Google Scholar
  • [5] Barron A (1994) Approximation and estimation bounds for artificial neural networks. Machine Learn. 14(1):115–133.CrossrefGoogle Scholar
  • [6] Blanchet A, Bolte J (2016) A family of functional inequalities, Łojasiewicz inequalities and displacement convex functions. Preprint, submitted December 8, https://arxiv.org/abs/1612.02619.Google Scholar
  • [7] Bojarski M, Del Test D, Dworakowski D, Firnier B, Flepp B, Goyal P, Jackel L, et al.. (2016) End to end learning for self-driving cars. Preprint, submitted April 25, https://arxiv.org/abs/1604.07316.Google Scholar
  • [8] Chizat L (2019) Sparse optimization on measures with over-parameterized gradient descent. Preprint, submitted July 24, https://arxiv.org/abs/1907.10300.Google Scholar
  • [9] Chizat L, Bach F (2018) A note on lazy training in supervised differentiable programming. Preprint, submitted December 19, https://arxiv.org/abs/1812.07956v1.Google Scholar
  • [10] Chizat L, Bach F (2018) On the global convergence of gradient descent for over-parameterized models using optimal transport. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 3040–3050.Google Scholar
  • [11] Cybenko G (1989) Approximation by superposition of a sigmoidal function. Math. Control Signals Systems 2(4):303–314.CrossrefGoogle Scholar
  • [12] Dawson DA (1997) Hierarchical and mean-field stepping stone models. Donnelly P, Tavaré S, eds. Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and Its Applications, vol. 87 (Springer, New York), 287–298.CrossrefGoogle Scholar
  • [13] Dawson DA (2018) Multilevel mutation-selection systems and set-valued duals. J. Math. Biol. 76(1–2):295–378.CrossrefGoogle Scholar
  • [14] Dawson DA, Gärtner J (1998) Analytic aspects of multilevel large deviations. Szyszkowicz B, ed. Asymptotic Methods in Probability and Statistics (Elsevier, Amsterdam), 401–440.CrossrefGoogle Scholar
  • [15] Dawson DA, Greven A (1993) Hierarchically interacting Fleming-Viot processes with selection and mutation: Multiple space-time scale analysis and quasi equilibria. Electronic J. Probab. 4:Paper 4.Google Scholar
  • [16] Dawson DA, Hochberg KJ (1982) Wandering random measures in the Fleming-Viot model. Ann. Probab. 10(3):554–580.CrossrefGoogle Scholar
  • [17] Dawson DA, Hochberg KJ (1991) A multilevel branching model. Adv. Appl. Probab. 23(4):701–715.CrossrefGoogle Scholar
  • [18] Dawson DA, Wu Y (1996) Multilevel multitype models of an information system. Athreya KB, Jagers P, eds. Classical and Modern Branching Processes, IMA Volumes in Mathematics and Its Operations, vol. 84 (Springer, New York), 57–72.Google Scholar
  • [19] Dawson DA, Hochberg KJ, Vinogradov V (1996) High-density limits of hierarchically structured branching-diffusing populations. Stochastic Process. Appl. 62(2):191–222.CrossrefGoogle Scholar
  • [20] Du SS, Lee JD, Li H, Wang L, Zhai X (2018) Gradient descent finds global minima of deep neural networks. Preprint, submitted November 9, https://arxiv.org/abs/1811.03804.Google Scholar
  • [21] Du SS, Zhai X, Poczos B, Singh A (2018) Gradient descent provably optimizes over-parameterized neural networks. Preprint, submitted October 4, https://arxiv.org/abs/1810.02054.Google Scholar
  • [22] Ethier S, Kurtz T (1986) Markov Processes: Characterization and Convergence (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • [23] Funahashi K-I (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3):183–192.CrossrefGoogle Scholar
  • [24] Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
  • [25] Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. IEEE Conf. Robotics Automation (IEEE, Piscataway, NJ), 3389–3396.Google Scholar
  • [26] Hauer D, Mazón J (2019) Kurdyka-Łojasiewicz-Simon inequality for gradient flows in metric spaces. Trans. Amer. Math. Soc. 372(7):4917–4976.CrossrefGoogle Scholar
  • [27] Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.CrossrefGoogle Scholar
  • [28] Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.CrossrefGoogle Scholar
  • [29] Hu K, Ren Z, Šiška D, Szpruch L (2019) Mean-field Langevin dynamics and energy landscape of neural networks. Preprint, submitted May 19, https://arxiv.org/abs/1905.07769.Google Scholar
  • [30] Jacot A, Gabriel F, Hongler C (2018) Neural tangent kernel: Convergence and generalization in neural networks. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 8571–8580.Google Scholar
  • [31] Kolokoltsov VN (2010) Nonlinear Markov Processes and Kinetic Equations, vol. 182 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [32] Krantz SG, Parks HR (2002) A Primer of Real Analytic Functions, 2nd ed. (Springer Science and Business Media, New York).CrossrefGoogle Scholar
  • [33] Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Canada.Google Scholar
  • [34] LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.CrossrefGoogle Scholar
  • [35] Leviathan Y, Matias Y (2018) Google Duplex: An AI system for accomplishing real-world tasks over the phone. Google AI Blog (May 8), https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html.Google Scholar
  • [36] Li D, Ding T, Sun R (2018) Over-parameterized deep neural networks have no strict local minima for any continuous activations. Preprint, December 28, https://arxiv.org/abs/1812.11039v1.Google Scholar
  • [37] Ling J, Jones R, Templeton J (2016) Machine learning strategies for systems with invariance properties. J. Comput. Phys. 318(August):22–35.CrossrefGoogle Scholar
  • [38] Ling J, Kurzawski A, Templeton J (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807:155–166.CrossrefGoogle Scholar
  • [39] Liang S, Sun R, Lee J, Srikant R (2018) Adding one neuron can eliminate all bad local minima. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing, vol. 31 (Curran Associates, Red Hook, NY), 4350–4360.Google Scholar
  • [40] Liang S, Sun R, Li Y, Srikant R (2018) Understanding the loss surface of neural networks for binary classification. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., vol. 35 (PMLR), 2835–2843.Google Scholar
  • [41] Mallat S (2016) Understanding deep convolutional neural networks. Philos. Trans. Roy. Soc. A 374(2065):20150203.CrossrefGoogle Scholar
  • [42] Mei S, Misiakiewicz T, Montanari A (2019) A mean field theory of two-layers neural networks: dimension free bounds and kernel limit. Preprint, February 16, https://arxiv.org/abs/1902.06015.Google Scholar
  • [43] Mei S, Montanari A, Nguyen P (2018) A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. 115(33);E7665–E7671.CrossrefGoogle Scholar
  • [44] Nguyen P-M (2019) Mean field limit of the learning dynamics of multilayer neural networks. Preprint, submitted February 7, https://arxiv.org/abs/1902.02880.Google Scholar
  • [45] Pierson H, Gashler M (2017) Deep learning in robotics: A review of recent research. Adv. Robotics 31(16):821–835.CrossrefGoogle Scholar
  • [46] Rotskoff GM, Vanden-Eijnden E (2018) Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. Preprint, submitted May 2, https://arxiv.org/abs/1805.00915v1.Google Scholar
  • [47] Sirignano J, Cont R (2018) Universal features of price formation in financial markets: Perspectives from Deep Learning. Preprint, submitted March 19, https://arxiv.org/abs/1803.06917.Google Scholar
  • [48] Sirignano J, Spiliopoulos K (2017) Stochastic gradient descent in continuous time. SIAM J. Financial Math. 8(1):933–961.CrossrefGoogle Scholar
  • [49] Sirignano J, Spiliopoulos K (2018) DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375(December):1339–1364.CrossrefGoogle Scholar
  • [50] Sirignano J, Spiliopoulos K (2020) Mean field analysis of neural networks: A central limit theorem. Stochastic Process. Appl. 130(3):1820–1852.CrossrefGoogle Scholar
  • [51] Sirignano J, Spiliopoulos K (2020) Mean field analysis of neural networks: A law of large numbers. SIAM J. Appl. Math. 80(2):725–752.CrossrefGoogle Scholar
  • [52] Sirignano J, Sadhwani A, Giesecke K (2016) Deep learning for mortgage risk. Preprint, submitted July 8, https://arxiv.org/abs/1607.02470.Google Scholar
  • [53] Sünderhauf N, Brock O, Cheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, et al. (2018) The limits and potentials of deep learning for robotics. Internat. J. Robotics Res. 37(4–5):405–420.CrossrefGoogle Scholar
  • [54] Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks.Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 3104–3112.Google Scholar
  • [55] Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proc. IEEE Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 1701–1708.Google Scholar
  • [56] Wu Y, Schuster M, Chen Z, Le Q, Norouzi M, Macherey W, Krikun M, et al.. (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. Preprint, submitted September 26, https://arxiv.org/abs/1609.08144.Google Scholar
  • [57] Zhang Y, Chan W, Jaitly N (2017) Very deep convolutional networks for end-to-end speech recognition. IEEE Internat. Conf. Acoustics, Speech, Signal Processing Proc. (IEEE, Piscataway, NJ), 4845–4849.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.