Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

Published Online:https://doi.org/10.1287/moor.2021.1176

References

  • [1] Aubin J-P, Frankowska H (1990) Set-Valued Analysis (Birkhäuser).Google Scholar
  • [2] Aviv Y (2003) A time-series framework for supply chain inventory management. Oper. Res. 51(2):210–227.LinkGoogle Scholar
  • [3] Beck A, Eldar YC (2007) Regularization in regression with bounded noise: A Chebyshev center approach. SIAM J. Matrix Anal. Appl. 29(2):606–625.CrossrefGoogle Scholar
  • [4] Beck A, Ben-Tal A, Eldar YC (2006) Robust mean-squared error estimation of multiple signals in linear systems affected by model and noise uncertainties. Math. Programming 107:155–187.CrossrefGoogle Scholar
  • [5] Beck A, Eldar YC, Ben-Tal A (2007) Mean-squared error estimation for linear systems with block circulant uncertainty. SIAM J. Matrix Anal. Appl. 29(3):712–730.CrossrefGoogle Scholar
  • [6] Berge C (1963) Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity (Courier Corporation).Google Scholar
  • [7] Bernstein DS (2009) Matrix Mathematics: Theory, Facts, and Formulas (Princeton University Press).Google Scholar
  • [8] Bertsekas D (2009) Convex Optimization Theory (Athena Scientific).Google Scholar
  • [9] Bhatia R, Jain T, Lim Y (2018) Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 30(9).CrossrefGoogle Scholar
  • [10] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press).Google Scholar
  • [11] Cambanis S, Huang S, Simons G (1981) On the theory of elliptically contoured distributions. J. Multivariate Anal. 11(3):368–385.CrossrefGoogle Scholar
  • [12] Chang SG, Yu B, Vetterli M (2000) Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Processing 9(9):1532–1546.CrossrefGoogle Scholar
  • [13] Chatfield C (2016) The Analysis of Time Series: An Introduction (CRC Press).Google Scholar
  • [14] Clarkson KL (2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4):1–30.CrossrefGoogle Scholar
  • [15] Cover TM, Thomas JA (2006) Elements of Information Theory (Wiley-Interscience).Google Scholar
  • [16] Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inform. Processing Systems 26:2292–2300.Google Scholar
  • [17] Demyanov VF, Rubinov AM (1970) Approximate Methods in Optimization Problems (American Elsevier Publishing).Google Scholar
  • [18] Diggavi SN, Cover TM (2001) The worst additive noise under a covariance constraint. IEEE Trans. Inform. Theory 47(7):3072–3081.CrossrefGoogle Scholar
  • [19] Dowson DC, Landau BV (1982) The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12(3):450–455.CrossrefGoogle Scholar
  • [20] Dunn JC (1979) Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2):187–211.CrossrefGoogle Scholar
  • [21] Dunn JC (1980) Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5):473–487.CrossrefGoogle Scholar
  • [22] Dunn JC, Harshbarger S (1978) Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2):432–444.CrossrefGoogle Scholar
  • [23] Eldar YC (2006) Robust competitive estimation with signal and noise covariance uncertainties. IEEE Trans. Inform. Theory 52(10):4532–4547.CrossrefGoogle Scholar
  • [24] Eldar YC, Merhav N (2004) A competitive minimax approach to robust estimation of random parameters. IEEE Trans. Signal Processing 52(7):1931–1946.CrossrefGoogle Scholar
  • [25] Eldar YC, Beck A, Teboulle M (2008) A minimax Chebyshev estimator for bounded error estimation. IEEE Trans. Signal Processing 56(4):1388–1397.CrossrefGoogle Scholar
  • [26] Eldar YC, Ben-Tal A, Nemirovski A (2004) Linear minimax regret estimation of deterministic parameters with bounded data uncertainties. IEEE Trans. Signal Processing 52(8):2177–2188.CrossrefGoogle Scholar
  • [27] Eldar YC, Ben-Tal A, Nemirovski A (2004) Robust mean-squared error estimation in the presence of model uncertainties. IEEE Trans. Signal Processing 53(1):168–181.CrossrefGoogle Scholar
  • [28] Fang K-T, Kotz S, Ng KW (1990) Symmetric Multivariate and Related Distributions (Chapman & Hall).Google Scholar
  • [29] Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. 3(1–2):95–110.CrossrefGoogle Scholar
  • [30] Freund RM, Grigas P (2016) New analysis and results for the Frank–Wolfe method. Math. Programming 155:199–230.CrossrefGoogle Scholar
  • [31] Garber D, Hazan E (2015) Faster rates for the Frank–Wolfe method over strongly-convex sets. Internat. Conf. Machine Learn., 541–549.Google Scholar
  • [32] Gelbrich M (1990) On a formula for the L2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten 147(1):185–203.CrossrefGoogle Scholar
  • [33] Givens CR, Shortt RM (1984) A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2):231–240.CrossrefGoogle Scholar
  • [34] Golnaraghi F, Kuo B (2017) Automatic Control Systems (McGraw-Hill Education).Google Scholar
  • [35] Guo D, Wu Y, Shitz SS, Verdu S (2011) Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inform. Theory 57(4):2371–2385.CrossrefGoogle Scholar
  • [36] Hamilton J (1994) Time Series Analysis (Princeton University Press).Google Scholar
  • [37] Hespanha JP (2009) Linear Systems Theory (Princeton University Press).Google Scholar
  • [38] Hult H, Lindskog F (2002) Multivariate extremes, aggregation and dependence in elliptical distributions. Adv. Appl. Probab. 34(3):587–608.CrossrefGoogle Scholar
  • [39] Jaggi M (2013) Revisiting Frank–Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. Machine Learn., 427–435.Google Scholar
  • [40] Jondeau E, Poon S, Rockinger M (2007) Financial Modeling under Non-Gaussian Distributions (Springer).Google Scholar
  • [41] Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11:517–553.Google Scholar
  • [42] Juditsky A, Nemirovski A (2018) Lectures on Statistical Inferences via Convex Optimization.Google Scholar
  • [43] Juditsky A, Nemirovski A (2018) Near-optimality of linear recovery in Gaussian observation scheme under |·|22-loss. Ann. Statist. 46(4):1603–1629.CrossrefGoogle Scholar
  • [44] Kay SM (1993) Fundamentals of Statistical Signal Processing: Estimation Theory (Prentice Hall).Google Scholar
  • [45] Kelker D (1970) Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya Ser. A. 32(4):419–430.Google Scholar
  • [46] Knott M, Smith CS (1984) On the optimal mapping of distributions. J. Optim. Theory Appl. 43:39–49.CrossrefGoogle Scholar
  • [47] Levitin ES, Polyak BT (1966) Constrained minimization methods. USSR Comput. Math. Math. Phys. 6(5):1–50.CrossrefGoogle Scholar
  • [48] Levy BC, Nikoukhah R (2004) Robust least-squares estimation with a relative entropy constraint. IEEE Trans. Inform. Theory 50(1):89–104.CrossrefGoogle Scholar
  • [49] Levy BC, Nikoukhah R (2012) Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Trans. Automatic Control 58(3):682–695.CrossrefGoogle Scholar
  • [50] Löfberg J (2004) YALMIP: A toolbox for modeling and optimization in MATLAB. IEEE Internat. Conf. Robotics Automation, 284–289.Google Scholar
  • [51] MacKay D (2003) Information Theory, Inference and Learning Algorithms (Cambridge University Press).Google Scholar
  • [52] Malagò L, Montrucchio L, Pistone G (2018) Wasserstein Riemannian geometry of Gaussian densities. Inform. Geometry 1:137–179.CrossrefGoogle Scholar
  • [53] Murphy K (2012) Machine Learning: A Probabilistic Perspective (MIT Press).Google Scholar
  • [54] Nesterov Y (2018) Complexity bounds for primal-dual methods minimizing the model of objective function. Math. Programming 171:311–330.CrossrefGoogle Scholar
  • [55] Nguyen VA, Kuhn D, Mohajerin Esfahani P (2021) Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res., ePub ahead of print July 23, https://doi.org/10.1287/opre.2020.2076.Google Scholar
  • [56] Ogata K (2009) Modern Control Engineering (Pearson).Google Scholar
  • [57] Olkin I, Pukelsheim F (1982) The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48:257–263.CrossrefGoogle Scholar
  • [58] Oppenheim AV, Verghese GC (2015) Signals, Systems and Inference (Pearson).Google Scholar
  • [59] Pedregosa F, Negiar G, Askari A, Jaggi M (2020) Linearly convergent Frank-Wolfe with backtracking line-search. Internat. Conf. Artificial Intelligence Statist., 1–10.Google Scholar
  • [60] Peyré G, Cuturi M (2019) Computational optimal transport. Foundations and Trends® in Machine Learning, vol. 11, 355–607. https://www.nowpublishers.com/article/Details/MAL-073.CrossrefGoogle Scholar
  • [61] Posekany A, Felsenstein K, Sykacek P (2011) Biological assessment of robust noise models in microarray data analysis. Bioinformatics 27(6):807–814.CrossrefGoogle Scholar
  • [62] Rockafellar RT, Wets RJ-B (2010) Variational Analysis (Springer).Google Scholar
  • [63] Rubel O, Naik PA (2017) Robust dynamic estimation. Marketing Sci. 36(3):453–467.LinkGoogle Scholar
  • [64] Ruttimann UE, Unser M, Rawlings RR, Rio D, Ramsey NF, Mattay VS, Hommer DW, Frank JA, Weinberger DR (1998) Statistical analysis of functional MRI data in the wavelet domain. IEEE Trans. Medical Imaging 17(2):142–154.CrossrefGoogle Scholar
  • [65] Schmitt BA (1992) Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl. 174:215–227.CrossrefGoogle Scholar
  • [66] Shafieezadeh-Abadeh S, Mohajerin Esfahani P, Kuhn D (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
  • [67] Shafieezadeh-Abadeh S, Nguyen V, Kuhn D, Mohajerin Esfahani P (2018) Wasserstein distributionally robust Kalman filtering. Adv. Neural Inform. Processing Systems 31:8483–8492.Google Scholar
  • [68] Sion M (1958) On general minimax theorems. Pacific J. Math. 8(1):171–176.CrossrefGoogle Scholar
  • [69] Solomon J, Goes FD, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graphics 34(4):1–11.CrossrefGoogle Scholar
  • [70] Sriram S, Kalwani MU (2007) Optimal advertising and promotion budgets in dynamic markets with brand equity as a mediating variable. Management Sci. 53(1):46–60.LinkGoogle Scholar
  • [71] Stock J, Watson M (2015) Introduction to Econometrics (Prentice Hall).Google Scholar
  • [72] Taşkesen B, Shafieezadeh Abadeh S, Kuhn D (2021) Semi-discrete optimal transport: Hardness, regularization and numerical solution. Preprint, submitted March 10, https://arxiv.org/abs/2103.06263.Google Scholar
  • [73] van Lint H, Djukic T (2012) Applications of Kalman filtering in traffic management and control. Mirchandani P, Cole Smith J, eds. New Directions in Informatics, Optimization, Logistics, and Production (INFORMS, Hanover, PA), 59–91.Google Scholar
  • [74] Wooldridge J (2010) Econometric Analysis of Cross Section and Panel Data (MIT Press).Google Scholar
  • [75] Zhou J (1995) On the existence of equilibrium for abstract economies. J. Math. Anal. Appl. 193(3):839–858.CrossrefGoogle Scholar
  • [76] Zorzi M (2016) Robust Kalman filtering under model perturbations. IEEE Trans. Automatic Control 62(6):2902–2907.CrossrefGoogle Scholar
  • [77] Zorzi M (2017) On the robustness of the Bayes and Wiener estimators under model uncertainty. Automatica 83:133–140.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.