Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization
Published Online:17 Dec 2021https://doi.org/10.1287/moor.2021.1176
References
- [1] (1990) Set-Valued Analysis (Birkhäuser).Google Scholar
- [2] (2003) A time-series framework for supply chain inventory management. Oper. Res. 51(2):210–227.Link, Google Scholar
- [3] (2007) Regularization in regression with bounded noise: A Chebyshev center approach. SIAM J. Matrix Anal. Appl. 29(2):606–625.Crossref, Google Scholar
- [4] (2006) Robust mean-squared error estimation of multiple signals in linear systems affected by model and noise uncertainties. Math. Programming 107:155–187.Crossref, Google Scholar
- [5] (2007) Mean-squared error estimation for linear systems with block circulant uncertainty. SIAM J. Matrix Anal. Appl. 29(3):712–730.Crossref, Google Scholar
- [6] (1963) Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity (Courier Corporation).Google Scholar
- [7] (2009) Matrix Mathematics: Theory, Facts, and Formulas (Princeton University Press).Google Scholar
- [8] (2009) Convex Optimization Theory (Athena Scientific).Google Scholar
- [9] (2018) Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 30(9).Crossref, Google Scholar
- [10] (2004) Convex Optimization (Cambridge University Press).Google Scholar
- [11] (1981) On the theory of elliptically contoured distributions. J. Multivariate Anal. 11(3):368–385.Crossref, Google Scholar
- [12] (2000) Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Processing 9(9):1532–1546.Crossref, Google Scholar
- [13] (2016) The Analysis of Time Series: An Introduction (CRC Press).Google Scholar
- [14] (2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4):1–30.Crossref, Google Scholar
- [15] (2006) Elements of Information Theory (Wiley-Interscience).Google Scholar
- [16] (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inform. Processing Systems 26:2292–2300.Google Scholar
- [17] (1970) Approximate Methods in Optimization Problems (American Elsevier Publishing).Google Scholar
- [18] (2001) The worst additive noise under a covariance constraint. IEEE Trans. Inform. Theory 47(7):3072–3081.Crossref, Google Scholar
- [19] (1982) The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12(3):450–455.Crossref, Google Scholar
- [20] (1979) Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2):187–211.Crossref, Google Scholar
- [21] (1980) Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5):473–487.Crossref, Google Scholar
- [22] (1978) Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2):432–444.Crossref, Google Scholar
- [23] (2006) Robust competitive estimation with signal and noise covariance uncertainties. IEEE Trans. Inform. Theory 52(10):4532–4547.Crossref, Google Scholar
- [24] (2004) A competitive minimax approach to robust estimation of random parameters. IEEE Trans. Signal Processing 52(7):1931–1946.Crossref, Google Scholar
- [25] (2008) A minimax Chebyshev estimator for bounded error estimation. IEEE Trans. Signal Processing 56(4):1388–1397.Crossref, Google Scholar
- [26] (2004) Linear minimax regret estimation of deterministic parameters with bounded data uncertainties. IEEE Trans. Signal Processing 52(8):2177–2188.Crossref, Google Scholar
- [27] (2004) Robust mean-squared error estimation in the presence of model uncertainties. IEEE Trans. Signal Processing 53(1):168–181.Crossref, Google Scholar
- [28] (1990) Symmetric Multivariate and Related Distributions (Chapman & Hall).Google Scholar
- [29] (1956) An algorithm for quadratic programming. Naval Res. Logist. 3(1–2):95–110.Crossref, Google Scholar
- [30] (2016) New analysis and results for the Frank–Wolfe method. Math. Programming 155:199–230.Crossref, Google Scholar
- [31] (2015) Faster rates for the Frank–Wolfe method over strongly-convex sets. Internat. Conf. Machine Learn., 541–549.Google Scholar
- [32] (1990) On a formula for the L2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten 147(1):185–203.Crossref, Google Scholar
- [33] (1984) A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2):231–240.Crossref, Google Scholar
- [34] (2017) Automatic Control Systems (McGraw-Hill Education).Google Scholar
- [35] (2011) Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inform. Theory 57(4):2371–2385.Crossref, Google Scholar
- [36] (1994) Time Series Analysis (Princeton University Press).Google Scholar
- [37] (2009) Linear Systems Theory (Princeton University Press).Google Scholar
- [38] (2002) Multivariate extremes, aggregation and dependence in elliptical distributions. Adv. Appl. Probab. 34(3):587–608.Crossref, Google Scholar
- [39] (2013) Revisiting Frank–Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. Machine Learn., 427–435.Google Scholar
- [40] (2007) Financial Modeling under Non-Gaussian Distributions (Springer).Google Scholar
- [41] (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11:517–553.Google Scholar
- [42] (2018) Lectures on Statistical Inferences via Convex Optimization.Google Scholar
- [43] (2018) Near-optimality of linear recovery in Gaussian observation scheme under |·|22-loss. Ann. Statist. 46(4):1603–1629.Crossref, Google Scholar
- [44] (1993) Fundamentals of Statistical Signal Processing: Estimation Theory (Prentice Hall).Google Scholar
- [45] (1970) Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya Ser. A. 32(4):419–430.Google Scholar
- [46] (1984) On the optimal mapping of distributions. J. Optim. Theory Appl. 43:39–49.Crossref, Google Scholar
- [47] (1966) Constrained minimization methods. USSR Comput. Math. Math. Phys. 6(5):1–50.Crossref, Google Scholar
- [48] (2004) Robust least-squares estimation with a relative entropy constraint. IEEE Trans. Inform. Theory 50(1):89–104.Crossref, Google Scholar
- [49] (2012) Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Trans. Automatic Control 58(3):682–695.Crossref, Google Scholar
- [50] (2004) YALMIP: A toolbox for modeling and optimization in MATLAB. IEEE Internat. Conf. Robotics Automation, 284–289.Google Scholar
- [51] (2003) Information Theory, Inference and Learning Algorithms (Cambridge University Press).Google Scholar
- [52] (2018) Wasserstein Riemannian geometry of Gaussian densities. Inform. Geometry 1:137–179.Crossref, Google Scholar
- [53] (2012) Machine Learning: A Probabilistic Perspective (MIT Press).Google Scholar
- [54] (2018) Complexity bounds for primal-dual methods minimizing the model of objective function. Math. Programming 171:311–330.Crossref, Google Scholar
- [55] (2021) Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res., ePub ahead of print July 23, https://doi.org/10.1287/opre.2020.2076.Google Scholar
- [56] (2009) Modern Control Engineering (Pearson).Google Scholar
- [57] (1982) The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48:257–263.Crossref, Google Scholar
- [58] (2015) Signals, Systems and Inference (Pearson).Google Scholar
- [59] (2020) Linearly convergent Frank-Wolfe with backtracking line-search. Internat. Conf. Artificial Intelligence Statist., 1–10.Google Scholar
- [60] (2019) Computational optimal transport. Foundations and Trends® in Machine Learning, vol. 11, 355–607. https://www.nowpublishers.com/article/Details/MAL-073.Crossref, Google Scholar
- [61] (2011) Biological assessment of robust noise models in microarray data analysis. Bioinformatics 27(6):807–814.Crossref, Google Scholar
- [62] (2010) Variational Analysis (Springer).Google Scholar
- [63] (2017) Robust dynamic estimation. Marketing Sci. 36(3):453–467.Link, Google Scholar
- [64] (1998) Statistical analysis of functional MRI data in the wavelet domain. IEEE Trans. Medical Imaging 17(2):142–154.Crossref, Google Scholar
- [65] (1992) Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl. 174:215–227.Crossref, Google Scholar
- [66] (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
- [67] (2018) Wasserstein distributionally robust Kalman filtering. Adv. Neural Inform. Processing Systems 31:8483–8492.Google Scholar
- [68] (1958) On general minimax theorems. Pacific J. Math. 8(1):171–176.Crossref, Google Scholar
- [69] (2015) Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graphics 34(4):1–11.Crossref, Google Scholar
- [70] (2007) Optimal advertising and promotion budgets in dynamic markets with brand equity as a mediating variable. Management Sci. 53(1):46–60.Link, Google Scholar
- [71] (2015) Introduction to Econometrics (Prentice Hall).Google Scholar
- [72] (2021) Semi-discrete optimal transport: Hardness, regularization and numerical solution. Preprint, submitted March 10, https://arxiv.org/abs/2103.06263.Google Scholar
- [73] (2012) Applications of Kalman filtering in traffic management and control. Mirchandani P, Cole Smith J, eds. New Directions in Informatics, Optimization, Logistics, and Production (INFORMS, Hanover, PA), 59–91.Google Scholar
- [74] (2010) Econometric Analysis of Cross Section and Panel Data (MIT Press).Google Scholar
- [75] (1995) On the existence of equilibrium for abstract economies. J. Math. Anal. Appl. 193(3):839–858.Crossref, Google Scholar
- [76] (2016) Robust Kalman filtering under model perturbations. IEEE Trans. Automatic Control 62(6):2902–2907.Crossref, Google Scholar
- [77] (2017) On the robustness of the Bayes and Wiener estimators under model uncertainty. Automatica 83:133–140.Crossref, Google Scholar

