Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

Viet Anh Nguyen
Viet Anh Nguyen
[email protected]
https://orcid.org/0000-0002-9607-7891
Department of Management Science and Engineering, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Soroosh Shafieezadeh-Abadeh
Soroosh Shafieezadeh-Abadeh
[email protected]
https://orcid.org/0000-0001-9095-2686
Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;
Search for more papers by this author
,
Daniel Kuhn
Daniel Kuhn
[email protected]
Risk Analytics and Optimization Chair, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland;
Search for more papers by this author
,
Peyman Mohajerin Esfahani
Peyman Mohajerin Esfahani
[email protected]
Delft Center for Systems and Control, Delft University of Technology, 2628 CD Delft, Netherlands
Search for more papers by this author

Department of Management Science and Engineering, Stanford University, Stanford, California 94305;

Search for more papers by this author

Soroosh Shafieezadeh-Abadeh

[email protected]

https://orcid.org/0000-0001-9095-2686

Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;

Search for more papers by this author

Daniel Kuhn

[email protected]

Risk Analytics and Optimization Chair, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland;

Search for more papers by this author

Peyman Mohajerin Esfahani

[email protected]

Delft Center for Systems and Control, Delft University of Technology, 2628 CD Delft, Netherlands

Search for more papers by this author

Published Online:17 Dec 2021https://doi.org/10.1287/moor.2021.1176

References

[1] Aubin J-P, Frankowska H (1990) Set-Valued Analysis (Birkhäuser).Google Scholar
[2] Aviv Y (2003) A time-series framework for supply chain inventory management. Oper. Res. 51(2):210–227.Link, Google Scholar
[3] Beck A, Eldar YC (2007) Regularization in regression with bounded noise: A Chebyshev center approach. SIAM J. Matrix Anal. Appl. 29(2):606–625.Crossref, Google Scholar
[4] Beck A, Ben-Tal A, Eldar YC (2006) Robust mean-squared error estimation of multiple signals in linear systems affected by model and noise uncertainties. Math. Programming 107:155–187.Crossref, Google Scholar
[5] Beck A, Eldar YC, Ben-Tal A (2007) Mean-squared error estimation for linear systems with block circulant uncertainty. SIAM J. Matrix Anal. Appl. 29(3):712–730.Crossref, Google Scholar
[6] Berge C (1963) Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity (Courier Corporation).Google Scholar
[7] Bernstein DS (2009) Matrix Mathematics: Theory, Facts, and Formulas (Princeton University Press).Google Scholar
[8] Bertsekas D (2009) Convex Optimization Theory (Athena Scientific).Google Scholar
[9] Bhatia R, Jain T, Lim Y (2018) Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 30(9).Crossref, Google Scholar
[10] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press).Google Scholar
[11] Cambanis S, Huang S, Simons G (1981) On the theory of elliptically contoured distributions. J. Multivariate Anal. 11(3):368–385.Crossref, Google Scholar
[12] Chang SG, Yu B, Vetterli M (2000) Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Processing 9(9):1532–1546.Crossref, Google Scholar
[13] Chatfield C (2016) The Analysis of Time Series: An Introduction (CRC Press).Google Scholar
[14] Clarkson KL (2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4):1–30.Crossref, Google Scholar
[15] Cover TM, Thomas JA (2006) Elements of Information Theory (Wiley-Interscience).Google Scholar
[16] Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inform. Processing Systems 26:2292–2300.Google Scholar
[17] Demyanov VF, Rubinov AM (1970) Approximate Methods in Optimization Problems (American Elsevier Publishing).Google Scholar
[18] Diggavi SN, Cover TM (2001) The worst additive noise under a covariance constraint. IEEE Trans. Inform. Theory 47(7):3072–3081.Crossref, Google Scholar
[19] Dowson DC, Landau BV (1982) The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12(3):450–455.Crossref, Google Scholar
[20] Dunn JC (1979) Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2):187–211.Crossref, Google Scholar
[21] Dunn JC (1980) Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5):473–487.Crossref, Google Scholar
[22] Dunn JC, Harshbarger S (1978) Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2):432–444.Crossref, Google Scholar
[23] Eldar YC (2006) Robust competitive estimation with signal and noise covariance uncertainties. IEEE Trans. Inform. Theory 52(10):4532–4547.Crossref, Google Scholar
[24] Eldar YC, Merhav N (2004) A competitive minimax approach to robust estimation of random parameters. IEEE Trans. Signal Processing 52(7):1931–1946.Crossref, Google Scholar
[25] Eldar YC, Beck A, Teboulle M (2008) A minimax Chebyshev estimator for bounded error estimation. IEEE Trans. Signal Processing 56(4):1388–1397.Crossref, Google Scholar
[26] Eldar YC, Ben-Tal A, Nemirovski A (2004) Linear minimax regret estimation of deterministic parameters with bounded data uncertainties. IEEE Trans. Signal Processing 52(8):2177–2188.Crossref, Google Scholar
[27] Eldar YC, Ben-Tal A, Nemirovski A (2004) Robust mean-squared error estimation in the presence of model uncertainties. IEEE Trans. Signal Processing 53(1):168–181.Crossref, Google Scholar
[28] Fang K-T, Kotz S, Ng KW (1990) Symmetric Multivariate and Related Distributions (Chapman & Hall).Google Scholar
[29] Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. 3(1–2):95–110.Crossref, Google Scholar
[30] Freund RM, Grigas P (2016) New analysis and results for the Frank–Wolfe method. Math. Programming 155:199–230.Crossref, Google Scholar
[31] Garber D, Hazan E (2015) Faster rates for the Frank–Wolfe method over strongly-convex sets. Internat. Conf. Machine Learn., 541–549.Google Scholar
[32] Gelbrich M (1990) On a formula for the L2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten 147(1):185–203.Crossref, Google Scholar
[33] Givens CR, Shortt RM (1984) A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2):231–240.Crossref, Google Scholar
[34] Golnaraghi F, Kuo B (2017) Automatic Control Systems (McGraw-Hill Education).Google Scholar
[35] Guo D, Wu Y, Shitz SS, Verdu S (2011) Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inform. Theory 57(4):2371–2385.Crossref, Google Scholar
[36] Hamilton J (1994) Time Series Analysis (Princeton University Press).Google Scholar
[37] Hespanha JP (2009) Linear Systems Theory (Princeton University Press).Google Scholar
[38] Hult H, Lindskog F (2002) Multivariate extremes, aggregation and dependence in elliptical distributions. Adv. Appl. Probab. 34(3):587–608.Crossref, Google Scholar
[39] Jaggi M (2013) Revisiting Frank–Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. Machine Learn., 427–435.Google Scholar
[40] Jondeau E, Poon S, Rockinger M (2007) Financial Modeling under Non-Gaussian Distributions (Springer).Google Scholar
[41] Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11:517–553.Google Scholar
[42] Juditsky A, Nemirovski A (2018) Lectures on Statistical Inferences via Convex Optimization.Google Scholar
[43] Juditsky A, Nemirovski A (2018) Near-optimality of linear recovery in Gaussian observation scheme under |·|22-loss. Ann. Statist. 46(4):1603–1629.Crossref, Google Scholar
[44] Kay SM (1993) Fundamentals of Statistical Signal Processing: Estimation Theory (Prentice Hall).Google Scholar
[45] Kelker D (1970) Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya Ser. A. 32(4):419–430.Google Scholar
[46] Knott M, Smith CS (1984) On the optimal mapping of distributions. J. Optim. Theory Appl. 43:39–49.Crossref, Google Scholar
[47] Levitin ES, Polyak BT (1966) Constrained minimization methods. USSR Comput. Math. Math. Phys. 6(5):1–50.Crossref, Google Scholar
[48] Levy BC, Nikoukhah R (2004) Robust least-squares estimation with a relative entropy constraint. IEEE Trans. Inform. Theory 50(1):89–104.Crossref, Google Scholar
[49] Levy BC, Nikoukhah R (2012) Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Trans. Automatic Control 58(3):682–695.Crossref, Google Scholar
[50] Löfberg J (2004) YALMIP: A toolbox for modeling and optimization in MATLAB. IEEE Internat. Conf. Robotics Automation, 284–289.Google Scholar
[51] MacKay D (2003) Information Theory, Inference and Learning Algorithms (Cambridge University Press).Google Scholar
[52] Malagò L, Montrucchio L, Pistone G (2018) Wasserstein Riemannian geometry of Gaussian densities. Inform. Geometry 1:137–179.Crossref, Google Scholar
[53] Murphy K (2012) Machine Learning: A Probabilistic Perspective (MIT Press).Google Scholar
[54] Nesterov Y (2018) Complexity bounds for primal-dual methods minimizing the model of objective function. Math. Programming 171:311–330.Crossref, Google Scholar
[55] Nguyen VA, Kuhn D, Mohajerin Esfahani P (2021) Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res., ePub ahead of print July 23, https://doi.org/10.1287/opre.2020.2076.Google Scholar
[56] Ogata K (2009) Modern Control Engineering (Pearson).Google Scholar
[57] Olkin I, Pukelsheim F (1982) The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48:257–263.Crossref, Google Scholar
[58] Oppenheim AV, Verghese GC (2015) Signals, Systems and Inference (Pearson).Google Scholar
[59] Pedregosa F, Negiar G, Askari A, Jaggi M (2020) Linearly convergent Frank-Wolfe with backtracking line-search. Internat. Conf. Artificial Intelligence Statist., 1–10.Google Scholar
[60] Peyré G, Cuturi M (2019) Computational optimal transport. Foundations and Trends® in Machine Learning, vol. 11, 355–607. https://www.nowpublishers.com/article/Details/MAL-073.Crossref, Google Scholar
[61] Posekany A, Felsenstein K, Sykacek P (2011) Biological assessment of robust noise models in microarray data analysis. Bioinformatics 27(6):807–814.Crossref, Google Scholar
[62] Rockafellar RT, Wets RJ-B (2010) Variational Analysis (Springer).Google Scholar
[63] Rubel O, Naik PA (2017) Robust dynamic estimation. Marketing Sci. 36(3):453–467.Link, Google Scholar
[64] Ruttimann UE, Unser M, Rawlings RR, Rio D, Ramsey NF, Mattay VS, Hommer DW, Frank JA, Weinberger DR (1998) Statistical analysis of functional MRI data in the wavelet domain. IEEE Trans. Medical Imaging 17(2):142–154.Crossref, Google Scholar
[65] Schmitt BA (1992) Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl. 174:215–227.Crossref, Google Scholar
[66] Shafieezadeh-Abadeh S, Mohajerin Esfahani P, Kuhn D (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
[67] Shafieezadeh-Abadeh S, Nguyen V, Kuhn D, Mohajerin Esfahani P (2018) Wasserstein distributionally robust Kalman filtering. Adv. Neural Inform. Processing Systems 31:8483–8492.Google Scholar
[68] Sion M (1958) On general minimax theorems. Pacific J. Math. 8(1):171–176.Crossref, Google Scholar
[69] Solomon J, Goes FD, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graphics 34(4):1–11.Crossref, Google Scholar
[70] Sriram S, Kalwani MU (2007) Optimal advertising and promotion budgets in dynamic markets with brand equity as a mediating variable. Management Sci. 53(1):46–60.Link, Google Scholar
[71] Stock J, Watson M (2015) Introduction to Econometrics (Prentice Hall).Google Scholar
[72] Taşkesen B, Shafieezadeh Abadeh S, Kuhn D (2021) Semi-discrete optimal transport: Hardness, regularization and numerical solution. Preprint, submitted March 10, https://arxiv.org/abs/2103.06263.Google Scholar
[73] van Lint H, Djukic T (2012) Applications of Kalman filtering in traffic management and control. Mirchandani P, Cole Smith J, eds. New Directions in Informatics, Optimization, Logistics, and Production (INFORMS, Hanover, PA), 59–91.Google Scholar
[74] Wooldridge J (2010) Econometric Analysis of Cross Section and Panel Data (MIT Press).Google Scholar
[75] Zhou J (1995) On the existence of equilibrium for abstract economies. J. Math. Anal. Appl. 193(3):839–858.Crossref, Google Scholar
[76] Zorzi M (2016) Robust Kalman filtering under model perturbations. IEEE Trans. Automatic Control 62(6):2902–2907.Crossref, Google Scholar
[77] Zorzi M (2017) On the robustness of the Bayes and Wiener estimators under model uncertainty. Automatica 83:133–140.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 48, Issue 1

February 2023

Pages 1-602, C2

Article Information

Metrics

Information

Received:November 08, 2019
Accepted:January 26, 2021
Published Online:December 17, 2021

Cite as

Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Peyman Mohajerin Esfahani (2021) Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization. Mathematics of Operations Research 48(1):1-37.

https://doi.org/10.1287/moor.2021.1176

Keywords

Acknowledgments

The authors are grateful to Erick Delage for valuable comments on an earlier version of this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

References

Volume 48, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News