Deterministic and Stochastic Frank–Wolfe Recursion on Probability Spaces
Published Online:23 Sep 2025https://doi.org/10.1287/moor.2024.0584
References
- [1] (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures (Birkhäuser, Basel).Google Scholar
- [2] (2017) Measuring the sensitivity of parameter estimates to estimation moments. Quart. J. Econom. 132(4):1553–1592.Crossref, Google Scholar
- [3] (1976) The Elements of Real Analysis (Wiley, New York).Google Scholar
- [4] (1999) Convergence of Probability Measures, Wiley Series in Probability and Statistics, 2nd ed. (Wiley, New York).Crossref, Google Scholar
- [5] (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
- [6] (2017) The alternating descent conditional gradient method for sparse inverse problems. SIAM J. Optim. 27(2):616–639.Crossref, Google Scholar
- [7] (2013) Inverse problems in spaces of measures. ESAIM: Control Optim. Calculus Variations 19(1):190–218.Crossref, Google Scholar
- [8] (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3–4):231–357.Crossref, Google Scholar
- [9] (2013) Super-resolution from noisy data. J. Fourier Anal. Appl. 19(6):1229–1254.Crossref, Google Scholar
- [10] (2014) Towards a mathematical theory of super-resolution. Comm. Pure Appl. Math. 67(6):906–956.Crossref, Google Scholar
- [11] (2005) Adaptive optimization of convex functionals in Banach spaces. SIAM J. Numer. Anal. 42(5):2043–2075.Crossref, Google Scholar
- [12] (2019) A blob method for diffusion. Calculus Variations Partial Differential Equations 58(2):53.Crossref, Google Scholar
- [13] (2022) Primal dual methods for Wasserstein gradient flows. Foundations Comput. Math. 22(2):389–443.Crossref, Google Scholar
- [14] (2022) Locally robust semiparametric estimation. Econometrica 90(4):1501–1535.Crossref, Google Scholar
- [15] (2018) Double/debiased machine learning for treatment and structural parameters. Econometrics J. 21(1):C1–C68. Crossref, Google Scholar
- [16] (2022) Convergence rates of gradient methods for convex optimization in the space of measures. Open J. Math. Optim. 3:8.Google Scholar
- [17] (2022) Sparse optimization on measures with over-parameterized gradient descent. Math. Programming 194(1–2):487–532.Crossref, Google Scholar
- [18] (2019) Probability functional descent: A unifying perspective on GANs, variational inference, and reinforcement learning. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 1213–1222.Google Scholar
- [19] (1983) A maximum expected covering location model: Formulation, properties, and heuristic solution. Transportation Sci. 17(1):48–70.Link, Google Scholar
- [20] (2012) Exact reconstruction using Beurling minimal extrapolation. J. Math. Anal. Appl. 395(1):336–354.Crossref, Google Scholar
- [21] (1970) Approximate Methods in Optimization Problems (Elsevier, New York).Google Scholar
- [22] (1978) Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2):432–444.Crossref, Google Scholar
- [23] (2019) Probability: Theory and Examples, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 49 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [24] (2019) Sparse inverse problems over measures: Equivalence of the conditional gradient and exchange methods. SIAM J. Optim. 29(2):1329–1349. Crossref, Google Scholar
- [25] (1997) Model-Oriented Design of Experiments, Lecture Notes in Statistics, vol. 125 (Springer, New York).Crossref, Google Scholar
- [26] (2013) Support detection in super-resolution. Preprint, submitted February 16, https://arxiv.org/abs/1302.3921.Google Scholar
- [27] (1983) von Mises Calculus for Statistical Functionals, Lecture Notes in Statistics, vol. 19 (Springer, New York). Crossref, Google Scholar
- [28] (2011) Functional derivatives in statistics: Asymptotics and robustness. Lovric M, ed. International Encyclopedia of Statistical Science (Springer, Berlin, Heidelberg).Google Scholar
- [29] (2009) Unconditional quantile regressions. Econometrica 77(3):953–973.Crossref, Google Scholar
- [30] (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
- [31] (2019) Bayesian posterior approximation via greedy particle optimization. Proc. AAAI Conf. Artificial Intelligence 33(1):3606–3613.Google Scholar
- [32] (2019) An inexact trust-region algorithm for constrained problems in Hilbert space and its application to the adaptive solution of optimal control problems with PDEs. Working paper, Technical University of Munich, Munich, Germany.Google Scholar
- [33] (2000) Empirical Processes in M-Estimation, vol. 6 (Cambridge University Press, Cambridge, UK).Google Scholar
- [34] (1974) The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69(346):383–393.Crossref, Google Scholar
- [35] (2022) How should volunteers be dispatched to out-of-hospital cardiac arrest cases? Queueing Systems 100(3–4):437–439.Crossref, Google Scholar
- [36] (1996) Robust Statistical Procedures, CBMS-NSF Regional Conference Series in Applied Mathematics, 2nd ed., vol. 68 (Society for Industrial and Applied Mathematics, Philadelphia).Google Scholar
- [37] (2016) Local maxima in the likelihood of Gaussian mixture models: Structural results and algorithmic consequences. Advances in Neural Information Processing Systems, 29.Google Scholar
- [38] (2021) Frank-Wolfe methods in probability space. Preprint, submitted May 11, https://arxiv.org/abs/2105.05352.Google Scholar
- [39] (1974) General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2(5):849–879.Crossref, Google Scholar
- [40] (2016) Convergence rate of Frank-Wolfe for non-convex objectives. arXiv preprint arXiv:1607.00345.Google Scholar
- [41] (2016) Stein variational gradient descent: A general purpose Bayesian inference algorithm. Advances in Neural Information Processing Systems, 29 (Curran Associates Inc., Red Hook, NY).Google Scholar
- [42] (2019) Matrix Differential Calculus with Applications in Statistics and Econometrics (John Wiley & Sons, New York).Crossref, Google Scholar
- [43] (2018) A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. USA 115(33):E7665–E7671.Crossref, Google Scholar
- [44] (2000) Tangent sets in the space of measures: With applications to variational analysis. J. Math. Anal. Appl. 249(2):539–552.Crossref, Google Scholar
- [45] (2002) Steepest descent algorithms in a space of measures. Statist. Comput. 12(2):115–123. Crossref, Google Scholar
- [46] (2018) Out-of-hospital cardiac arrest: Current concepts. Lancet 391(10124):970–979.Crossref, Google Scholar
- [47] (2004) Introductory Lectures on Convex Optimization A Basic Course, Applied Optimization, 1st ed., vol. 87 (Springer, New York).Crossref, Google Scholar
- [48] (2018) Lectures on Convex Optimization, Springer Optimization and Its Applications, vol. 137 (Springer, Cham, Switzerland).Crossref, Google Scholar
- [49] (2000) Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, Wiley Series in Probability and Statistics, 2nd ed. (Wiley, Chichester, UK).Crossref, Google Scholar
- [50] (2023) The Frank-Wolfe algorithm: A short introduction. Jahresbericht der Deutschen Mathematiker-Vereinigung 126(1):3–35.Google Scholar
- [51] (1983) Optimal Design of Experiments (Wiley, New York).Google Scholar
- [52] (2004) Cumulative residual entropy: A new measure of information. IEEE Trans. Inform. Theory 50(6):1220–1228.Crossref, Google Scholar
- [53] (2016) Stochastic Frank-Wolfe methods for nonconvex optimization. 2016 54th Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE), 1244–1251.Google Scholar
- [54] (2015) Optimal Transport for Applied Mathematicians, Progress in Nonlinear Differential Equations and Their Applications, vol. 87 (Birkhäuser, Cham, Switzerland).Crossref, Google Scholar
- [55] (2015) Self-normalization for time series: A review of recent developments. J. Amer. Statist. Assoc. 110(512):1797–1817.Crossref, Google Scholar
- [56] (2009) Lectures on Stochastic Programming: Modeling and Theory (Society for Industrial and Applied Mathematics, Philadelphia).Crossref, Google Scholar
- [57] (2017) Adaptive multilevel trust-region methods for time-dependent PDE-constrained optimization. Portugaliae Mathematica 74(1):37–67.Crossref, Google Scholar
- [58] (2025) Modeling the impact of community first responders. Management Sci. 71(2):992–1008.Link, Google Scholar
- [59] (2024) Learning Gaussian mixtures using the Wasserstein–Fisher–Rao gradient flow. Ann. Statist. 52(4):1774–1795.Crossref, Google Scholar

