Nash Equilibria, Regularization, and Computation in Optimal Transport-Based Distributionally Robust Optimization

Published Online:https://doi.org/10.1287/opre.2023.0138

References

  • Aolaritei L, Shafiee S, Dörfler F (2022) Wasserstein distributionally robust estimation in high dimensions: Performance analysis and optimal hyperparameter tuning. Preprint, submitted June 27, https://doi.org/10.48550/arXiv.2206.13269.Google Scholar
  • Aolaritei L, Fochesato M, Lygeros J, Dörfler F (2023) Wasserstein tube MPC with exact uncertainty propagation. Conf. Decision Control (IEEE, Piscataway, NJ), 2036–2041.Google Scholar
  • Aolaritei L, Lanzetti N, Chen H, Dörfler F (2025) Distributional uncertainty propagation via optimal transport. IEEE Trans. Automatic Control 70(9):6080–6095.CrossrefGoogle Scholar
  • Attouch H, Azé D (1993) Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method. Ann. L’Institut Henri Poincaré 10(3):289–312.CrossrefGoogle Scholar
  • Attouch H, Wets RJ-B (1989) Epigraphical analysis. Ann. L’Institut Henri Poincaré 6:73–100.CrossrefGoogle Scholar
  • Banach S (1938) Über homogene Polynome in (L2). Studia Math. 7(1):36–44.CrossrefGoogle Scholar
  • Bartl D, Drapeau S, Obłój J, Wiesel J (2021) Sensitivity analysis of Wasserstein distributionally robust optimization problems. Proc. Roy. Soc. A 477:20210176.Google Scholar
  • Bauschke HH, Combettes PL (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin).CrossrefGoogle Scholar
  • Bergemann D, Schlag KH (2008) Pricing without priors. J. Eur. Econom. Assoc. 6(2–3):560–569.CrossrefGoogle Scholar
  • Billingsley P (2013) Convergence of Probability Measures (John Wiley, Hoboken, NJ).Google Scholar
  • Blanchet J, Murthy K (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.LinkGoogle Scholar
  • Blanchet J, Kang Y, Murthy K (2019a) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.CrossrefGoogle Scholar
  • Blanchet J, Glynn PW, Yan J, Zhou Z (2019b) Multivariate distributionally robust convex regression under absolute error loss. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 11817–11826.Google Scholar
  • Blanchet J, Murthy K, Si N (2022a) Confidence regions in Wasserstein distributionally robust estimation. Biometrika 109(2):295–315.CrossrefGoogle Scholar
  • Blanchet J, Murthy K, Zhang F (2022b) Optimal transport-based distributionally robust optimization: Structural properties and iterative schemes. Math. Oper. Res. 47(2):1500–1529.LinkGoogle Scholar
  • Boskos D, Cortes J, Martinez S (2021) Data-driven ambiguity sets with probabilistic guarantees for dynamic processes. IEEE Trans. Automatic Control 66(7):2991–3006.CrossrefGoogle Scholar
  • Bougeard M, Penot J-P, Pommellet A (1991) Towards minimal assumptions for the infimal convolution regularization. J. Approx. Theory 64(3):245–270.CrossrefGoogle Scholar
  • Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM J. Imaging Sci. 3(3):492–526.CrossrefGoogle Scholar
  • Chen R, Paschalidis I (2018) A robust learning approach for regression models based on distributionally robust optimization. J. Machine Learn. Res. 19(1):517–564.Google Scholar
  • Chen R, Paschalidis I (2019) Selecting optimal decisions via distributionally robust nearest-neighbor regression. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 749–759.Google Scholar
  • Chen Z, Kuhn D, Wiesemann W (2024) Data-driven chance constrained programs over Wasserstein balls. Oper. Res. 72(1):410–424.LinkGoogle Scholar
  • Coulson J, Lygeros J, Dörfler F (2021) Distributionally robust chance constrained data-enabled predictive control. IEEE Trans. Automatic Control 67(7):3289–3304.CrossrefGoogle Scholar
  • Cover T, Thomas J (2006) Elements of Information Theory (John Wiley, Hoboken, NJ).Google Scholar
  • De la Fuente A (2000) Mathematical Methods and Models for Economists (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust optimization. Ann. Statist. 49(3):1378–1406.CrossrefGoogle Scholar
  • Duchi JC, Glynn PW, Namkoong H (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.LinkGoogle Scholar
  • Farnia F, Tse D (2016) A minimax approach to supervised learning. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 4240–4248.Google Scholar
  • Finlay C, Oberman AM (2021) Scaleable input gradient regularization for adversarial robustness. Machine Learn. Appl. 3:100017.Google Scholar
  • Gao R (2023) Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. Oper. Res. 71(6):2291–2306.LinkGoogle Scholar
  • Gao R, Kleywegt A (2023) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.LinkGoogle Scholar
  • Gao R, Chen X, Kleywegt AJ (2024) Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 72(3):1177–1191.LinkGoogle Scholar
  • Gao R, Xie L, Xie Y, Xu H (2018) Robust hypothesis testing using Wasserstein uncertainty sets. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 7902–7912.Google Scholar
  • Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://doi.org/10.48550/arXiv.1412.6572.Google Scholar
  • Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 5769–5779.Google Scholar
  • Hein M, Andriushchenko M (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 2263–2273.Google Scholar
  • Hiriart-Urruty J-B (1980) Extension of Lipschitz functions. J. Math. Anal. Appl. 77(2):539–554.CrossrefGoogle Scholar
  • Ho-Nguyen N, Wright SJ (2023) Adversarial classification via distributional robustness with Wasserstein ambiguity. Math. Programming 198(2):1411–1447.CrossrefGoogle Scholar
  • Ho-Nguyen N, Kılınç-Karzan F, Küçükyavuz S, Lee D (2022) Distributionally robust chance-constrained programs with right-hand side uncertainty under Wasserstein ambiguity. Math. Programming 196(1–2):641–672.CrossrefGoogle Scholar
  • Ho-Nguyen N, Kilinç-Karzan F, Küçükyavuz S, Lee D (2023) Strong formulations for distributionally robust chance-constrained programs with left-hand side uncertainty under Wasserstein ambiguity. INFORMS J. Optim. 5(2):211–232.LinkGoogle Scholar
  • Hu Y, Ongie G, Ramani S, Jacob M (2014) Generalized higher degree total variation (HDTV) regularization. IEEE Trans. Image Processing 23(6):2423–2435.CrossrefGoogle Scholar
  • Jakubovitz D, Giryes R (2018) Improving DNN robustness to adversarial attacks using Jacobian regularization. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Eur. Conf. Comput. Vision (Springer, Berlin), 514–529.Google Scholar
  • Koçyiğit Ç, Rujeerapaiboon N, Kuhn D (2022) Robust multidimensional pricing: Separation without regret. Math. Programming 196(1–2):841–874.CrossrefGoogle Scholar
  • Koçyiğit Ç, Iyengar G, Kuhn D, Wiesemann W (2020) Distributionally robust mechanism design. Management Sci. 66(1):159–189.LinkGoogle Scholar
  • Kuhn D, Shafiee S, Wiesemann W (2025) Distributionally robust optimization. Acta Numerica 34:579–804.CrossrefGoogle Scholar
  • Kuhn D, Mohajerin Esfahani P, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS, Cantonsville, MD), 130–166.LinkGoogle Scholar
  • Kwon Y, Kim W, Won J-H, Paik MC (2020) Principled learning method for Wasserstein distributionally robust optimization with local perturbations. Daumé III H, Singh A, eds. Internat. Conf. Machine Learn (PMLR, New York), 5567–5576.Google Scholar
  • Lam H (2019) Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. Oper. Res. 67(4):1090–1105.AbstractGoogle Scholar
  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.CrossrefGoogle Scholar
  • Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88(2):365–411.CrossrefGoogle Scholar
  • Lee J, Raginsky M (2018) Minimax statistical learning with Wasserstein distances. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 2687–2696.Google Scholar
  • Lefkimmiatis S, Unser M (2013) Poisson image reconstruction with Hessian Schatten-norm regularization. IEEE Trans. Image Processing 22(11):4314–4327.CrossrefGoogle Scholar
  • Lefkimmiatis S, Ward JP, Unser M (2013) Hessian Schatten-norm regularization for linear inverse problems. IEEE Trans. Image Processing 22(5):1873–1888.CrossrefGoogle Scholar
  • Lehmann EL, Casella G (2006) Theory of Point Estimation (Springer, Berlin).Google Scholar
  • Levy BC, Nikoukhah R (2004) Robust least-squares estimation with a relative entropy constraint. IEEE Trans. Inform. Theory 50(1):89–104.CrossrefGoogle Scholar
  • Levy BC, Nikoukhah R (2012) Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Trans. Automatic Control 58(3):682–695.CrossrefGoogle Scholar
  • Lyu C, Huang K, Liang H-N (2015) A unified gradient regularization family for adversarial examples. Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 301–309.Google Scholar
  • Mądry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (MIT Press, Cambridge, MA).Google Scholar
  • Malitsky Y (2020) Golden ratio algorithms for variational inequalities. Math. Programming 184(1):383–410.CrossrefGoogle Scholar
  • Mohajerin Esfahani P, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.CrossrefGoogle Scholar
  • Mohajerin Esfahani P, Shafieezadeh-Abadeh S, Hanasusanto GA, Kuhn D (2018) Data-driven inverse optimization with imperfect information. Math. Programming 167(1):191–234.CrossrefGoogle Scholar
  • Nagarajan V, Kolter JZ (2017) Gradient descent GAN optimization is locally stable. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 5591–5600.Google Scholar
  • Nguyen VA, Kuhn D, Mohajerin Esfahani P (2022) Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res. 70(1):490–515.LinkGoogle Scholar
  • Nguyen VA, Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2023) Bridging Bayesian and minimax mean square error estimation via Wasserstein distributionally robust optimization. Math. Oper. Res. 48(1):1–37.LinkGoogle Scholar
  • Ororbia AG II, Kifer D, Giles CL (2017) Unifying adversarial training algorithms with data gradient regularization. Neural Comput. 29(4):867–887.CrossrefGoogle Scholar
  • Papernot N, McDaniel P, Goodfellow I (2016a) Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. Preprint, submitted May 24, https://arxiv.org/abs/1605.07277.Google Scholar
  • Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. IEEE Sympos. Security Privacy (IEEE, Piscataway, NJ), 582–597.Google Scholar
  • Parikh N, Boyd S (2014) Proximal algorithms. Foundations Trends Optim. 1(3):127–239.CrossrefGoogle Scholar
  • Penot J-P (1998) Proximal mappings. J. Approx. Theory 94(2):203–221.CrossrefGoogle Scholar
  • Polyak B (1987) Introduction to Optimization (Optimization Software, Inc, Dallas).Google Scholar
  • Rockafellar RT, Wets RJ-B (2009) Variational Analysis (Springer, Berlin).Google Scholar
  • Roth K, Lucchi A, Nowozin S, Hofmann T (2017) Stabilizing training of generative adversarial networks through regularization. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 2018–2028.Google Scholar
  • Rujeerapaiboon N, Kuhn D, Wiesemann W (2016) Robust growth-optimal portfolios. Management Sci. 62(7):2090–2109.LinkGoogle Scholar
  • Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
  • Shafieezadeh-Abadeh S, Mohajerin Esfahani P, Kuhn D (2015) Distributionally robust logistic regression. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 1576–1584.Google Scholar
  • Shafieezadeh-Abadeh S, Nguyen VA, Kuhn D, Mohajerin Esfahani P (2018) Wasserstein distributionally robust Kalman filtering. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 8474–8483.Google Scholar
  • Shapiro A, Dentcheva D, Ruszczyński A (2014) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Shen H, Jiang R (2022) Chance-constrained set covering with Wasserstein ambiguity. Math. Programming 198:621–674.CrossrefGoogle Scholar
  • Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2013) Intriguing properties of neural networks. Preprint, submitted December 21, https://doi.org/10.48550/arXiv.1312.6199.Google Scholar
  • Taşkesen B, Shafieezadeh-Abadeh S, Kuhn D (2023) Semi-discrete optimal transport: Hardness, regularization and numerical solution. Math. Programming 199(1):1033–1106.CrossrefGoogle Scholar
  • Tu Z, Zhang J, Tao D (2019) Theoretical analysis of adversarial learning: A minimax approach. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 12280–12290.Google Scholar
  • Van Parys BP, Mohajerin Esfahani P, Kuhn D (2021) From data to decisions: Distributionally robust optimization is optimal. Management Sci. 67(6):3387–3402.LinkGoogle Scholar
  • Varga D, Csiszárik A, Zombori Z (2018) Gradient regularization improves accuracy of discriminative models. Schedae Informaticae 27:31–45.CrossrefGoogle Scholar
  • Villani C (2008) Optimal Transport: Old and New (Springer, Berlin).Google Scholar
  • Volpi R, Namkoong H, Sener O, Duchi J, Murino V, Savarese S (2018) Generalizing to unseen domains via adversarial data augmentation. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 5339–5349.Google Scholar
  • Wang Y, Ma X, Bailey J, Yi J, Zhou B, Gu Q (2019) On the convergence and robustness of adversarial training. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 6586–6595.Google Scholar
  • Xie W (2021) On distributionally robust chance constrained programs with Wasserstein distance. Math. Programming 186(1):115–155.CrossrefGoogle Scholar
  • Yang I (2020) Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Trans. Automatic Control 66(8):3863–3870.CrossrefGoogle Scholar
  • Yue M-C, Kuhn D, Wiesemann W (2022) On linear optimization over Wasserstein balls. Math. Programming 195(1–2):1107–1122.CrossrefGoogle Scholar
  • Zhang L, Yang J, Gao R (2024) A short and general duality proof for Wasserstein distributionally robust optimization. Oper. Res. 73(4):2146–2155.LinkGoogle Scholar
  • Zhao C, Guan Y (2018) Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2):262–267.CrossrefGoogle Scholar
  • Zhen J, Kuhn D, Wiesemann W (2023) A unified theory of robust and distributionally robust optimization via the primal-worst-equals-dual-best principle. Oper. Res. 73(2):862–878.LinkGoogle Scholar
  • Zorzi M (2016) Robust Kalman filtering under model perturbations. IEEE Trans. Automatic Control 62(6):2902–2907.CrossrefGoogle Scholar
  • Zorzi M (2017) On the robustness of the Bayes and Wiener estimators under model uncertainty. Automatica 83:133–140.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.