Nash Equilibria, Regularization, and Computation in Optimal Transport-Based Distributionally Robust Optimization

Soroosh Shafiee
Corresponding Author
Soroosh Shafiee
[email protected]
https://orcid.org/0000-0001-9095-2686
Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850
Search for more papers by this author
,
Liviu Aolaritei
Liviu Aolaritei
[email protected]
https://orcid.org/0000-0001-6710-3723
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author
,
Florian Dörfler
Florian Dörfler
[email protected]
https://orcid.org/0000-0002-9649-5305
Automatic Control Lab, Eidgenössische Technische Hochschule Zürich, 8092 Zürich, Switzerland
Search for more papers by this author
,
Daniel Kuhn
Daniel Kuhn
[email protected]
https://orcid.org/0000-0003-2697-8886
Risk Analytics and Optimization Chair, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
Search for more papers by this author

Soroosh Shafiee

Corresponding Author

Soroosh Shafiee

[email protected]

https://orcid.org/0000-0001-9095-2686

Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850

Search for more papers by this author

Liviu Aolaritei

[email protected]

https://orcid.org/0000-0001-6710-3723

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Florian Dörfler

[email protected]

https://orcid.org/0000-0002-9649-5305

Automatic Control Lab, Eidgenössische Technische Hochschule Zürich, 8092 Zürich, Switzerland

Search for more papers by this author

Daniel Kuhn

[email protected]

https://orcid.org/0000-0003-2697-8886

Risk Analytics and Optimization Chair, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

Search for more papers by this author

Published Online:19 Dec 2025https://doi.org/10.1287/opre.2023.0138

References

Aolaritei L, Shafiee S, Dörfler F (2022) Wasserstein distributionally robust estimation in high dimensions: Performance analysis and optimal hyperparameter tuning. Preprint, submitted June 27, https://doi.org/10.48550/arXiv.2206.13269.Google Scholar
Aolaritei L, Fochesato M, Lygeros J, Dörfler F (2023) Wasserstein tube MPC with exact uncertainty propagation. Conf. Decision Control (IEEE, Piscataway, NJ), 2036–2041.Google Scholar
Aolaritei L, Lanzetti N, Chen H, Dörfler F (2025) Distributional uncertainty propagation via optimal transport. IEEE Trans. Automatic Control 70(9):6080–6095.Crossref, Google Scholar
Attouch H, Azé D (1993) Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method. Ann. L’Institut Henri Poincaré 10(3):289–312.Crossref, Google Scholar
Attouch H, Wets RJ-B (1989) Epigraphical analysis. Ann. L’Institut Henri Poincaré 6:73–100.Crossref, Google Scholar
Banach S (1938) Über homogene Polynome in (L2). Studia Math. 7(1):36–44.Crossref, Google Scholar
Bartl D, Drapeau S, Obłój J, Wiesel J (2021) Sensitivity analysis of Wasserstein distributionally robust optimization problems. Proc. Roy. Soc. A 477:20210176.Google Scholar
Bauschke HH, Combettes PL (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin).Crossref, Google Scholar
Bergemann D, Schlag KH (2008) Pricing without priors. J. Eur. Econom. Assoc. 6(2–3):560–569.Crossref, Google Scholar
Billingsley P (2013) Convergence of Probability Measures (John Wiley, Hoboken, NJ).Google Scholar
Blanchet J, Murthy K (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
Blanchet J, Kang Y, Murthy K (2019a) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
Blanchet J, Glynn PW, Yan J, Zhou Z (2019b) Multivariate distributionally robust convex regression under absolute error loss. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 11817–11826.Google Scholar
Blanchet J, Murthy K, Si N (2022a) Confidence regions in Wasserstein distributionally robust estimation. Biometrika 109(2):295–315.Crossref, Google Scholar
Blanchet J, Murthy K, Zhang F (2022b) Optimal transport-based distributionally robust optimization: Structural properties and iterative schemes. Math. Oper. Res. 47(2):1500–1529.Link, Google Scholar
Boskos D, Cortes J, Martinez S (2021) Data-driven ambiguity sets with probabilistic guarantees for dynamic processes. IEEE Trans. Automatic Control 66(7):2991–3006.Crossref, Google Scholar
Bougeard M, Penot J-P, Pommellet A (1991) Towards minimal assumptions for the infimal convolution regularization. J. Approx. Theory 64(3):245–270.Crossref, Google Scholar
Bredies K, Kunisch K, Pock T (2010) Total generalized variation. SIAM J. Imaging Sci. 3(3):492–526.Crossref, Google Scholar
Chen R, Paschalidis I (2018) A robust learning approach for regression models based on distributionally robust optimization. J. Machine Learn. Res. 19(1):517–564.Google Scholar
Chen R, Paschalidis I (2019) Selecting optimal decisions via distributionally robust nearest-neighbor regression. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 749–759.Google Scholar
Chen Z, Kuhn D, Wiesemann W (2024) Data-driven chance constrained programs over Wasserstein balls. Oper. Res. 72(1):410–424.Link, Google Scholar
Coulson J, Lygeros J, Dörfler F (2021) Distributionally robust chance constrained data-enabled predictive control. IEEE Trans. Automatic Control 67(7):3289–3304.Crossref, Google Scholar
Cover T, Thomas J (2006) Elements of Information Theory (John Wiley, Hoboken, NJ).Google Scholar
De la Fuente A (2000) Mathematical Methods and Models for Economists (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust optimization. Ann. Statist. 49(3):1378–1406.Crossref, Google Scholar
Duchi JC, Glynn PW, Namkoong H (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.Link, Google Scholar
Farnia F, Tse D (2016) A minimax approach to supervised learning. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 4240–4248.Google Scholar
Finlay C, Oberman AM (2021) Scaleable input gradient regularization for adversarial robustness. Machine Learn. Appl. 3:100017.Google Scholar
Gao R (2023) Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. Oper. Res. 71(6):2291–2306.Link, Google Scholar
Gao R, Kleywegt A (2023) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.Link, Google Scholar
Gao R, Chen X, Kleywegt AJ (2024) Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 72(3):1177–1191.Link, Google Scholar
Gao R, Xie L, Xie Y, Xu H (2018) Robust hypothesis testing using Wasserstein uncertainty sets. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 7902–7912.Google Scholar
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://doi.org/10.48550/arXiv.1412.6572.Google Scholar
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 5769–5779.Google Scholar
Hein M, Andriushchenko M (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 2263–2273.Google Scholar
Hiriart-Urruty J-B (1980) Extension of Lipschitz functions. J. Math. Anal. Appl. 77(2):539–554.Crossref, Google Scholar
Ho-Nguyen N, Wright SJ (2023) Adversarial classification via distributional robustness with Wasserstein ambiguity. Math. Programming 198(2):1411–1447.Crossref, Google Scholar
Ho-Nguyen N, Kılınç-Karzan F, Küçükyavuz S, Lee D (2022) Distributionally robust chance-constrained programs with right-hand side uncertainty under Wasserstein ambiguity. Math. Programming 196(1–2):641–672.Crossref, Google Scholar
Ho-Nguyen N, Kilinç-Karzan F, Küçükyavuz S, Lee D (2023) Strong formulations for distributionally robust chance-constrained programs with left-hand side uncertainty under Wasserstein ambiguity. INFORMS J. Optim. 5(2):211–232.Link, Google Scholar
Hu Y, Ongie G, Ramani S, Jacob M (2014) Generalized higher degree total variation (HDTV) regularization. IEEE Trans. Image Processing 23(6):2423–2435.Crossref, Google Scholar
Jakubovitz D, Giryes R (2018) Improving DNN robustness to adversarial attacks using Jacobian regularization. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Eur. Conf. Comput. Vision (Springer, Berlin), 514–529.Google Scholar
Koçyiğit Ç, Rujeerapaiboon N, Kuhn D (2022) Robust multidimensional pricing: Separation without regret. Math. Programming 196(1–2):841–874.Crossref, Google Scholar
Koçyiğit Ç, Iyengar G, Kuhn D, Wiesemann W (2020) Distributionally robust mechanism design. Management Sci. 66(1):159–189.Link, Google Scholar
Kuhn D, Shafiee S, Wiesemann W (2025) Distributionally robust optimization. Acta Numerica 34:579–804.Crossref, Google Scholar
Kuhn D, Mohajerin Esfahani P, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS, Cantonsville, MD), 130–166.Link, Google Scholar
Kwon Y, Kim W, Won J-H, Paik MC (2020) Principled learning method for Wasserstein distributionally robust optimization with local perturbations. Daumé III H, Singh A, eds. Internat. Conf. Machine Learn (PMLR, New York), 5567–5576.Google Scholar
Lam H (2019) Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. Oper. Res. 67(4):1090–1105.Abstract, Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88(2):365–411.Crossref, Google Scholar
Lee J, Raginsky M (2018) Minimax statistical learning with Wasserstein distances. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc, Red Hook, NY), 2687–2696.Google Scholar
Lefkimmiatis S, Unser M (2013) Poisson image reconstruction with Hessian Schatten-norm regularization. IEEE Trans. Image Processing 22(11):4314–4327.Crossref, Google Scholar
Lefkimmiatis S, Ward JP, Unser M (2013) Hessian Schatten-norm regularization for linear inverse problems. IEEE Trans. Image Processing 22(5):1873–1888.Crossref, Google Scholar
Lehmann EL, Casella G (2006) Theory of Point Estimation (Springer, Berlin).Google Scholar
Levy BC, Nikoukhah R (2004) Robust least-squares estimation with a relative entropy constraint. IEEE Trans. Inform. Theory 50(1):89–104.Crossref, Google Scholar
Levy BC, Nikoukhah R (2012) Robust state space filtering under incremental model perturbations subject to a relative entropy tolerance. IEEE Trans. Automatic Control 58(3):682–695.Crossref, Google Scholar
Lyu C, Huang K, Liang H-N (2015) A unified gradient regularization family for adversarial examples. Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 301–309.Google Scholar
Mądry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (MIT Press, Cambridge, MA).Google Scholar
Malitsky Y (2020) Golden ratio algorithms for variational inequalities. Math. Programming 184(1):383–410.Crossref, Google Scholar
Mohajerin Esfahani P, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.Crossref, Google Scholar
Mohajerin Esfahani P, Shafieezadeh-Abadeh S, Hanasusanto GA, Kuhn D (2018) Data-driven inverse optimization with imperfect information. Math. Programming 167(1):191–234.Crossref, Google Scholar
Nagarajan V, Kolter JZ (2017) Gradient descent GAN optimization is locally stable. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 5591–5600.Google Scholar
Nguyen VA, Kuhn D, Mohajerin Esfahani P (2022) Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res. 70(1):490–515.Link, Google Scholar
Nguyen VA, Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2023) Bridging Bayesian and minimax mean square error estimation via Wasserstein distributionally robust optimization. Math. Oper. Res. 48(1):1–37.Link, Google Scholar
Ororbia AG II, Kifer D, Giles CL (2017) Unifying adversarial training algorithms with data gradient regularization. Neural Comput. 29(4):867–887.Crossref, Google Scholar
Papernot N, McDaniel P, Goodfellow I (2016a) Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. Preprint, submitted May 24, https://arxiv.org/abs/1605.07277.Google Scholar
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. IEEE Sympos. Security Privacy (IEEE, Piscataway, NJ), 582–597.Google Scholar
Parikh N, Boyd S (2014) Proximal algorithms. Foundations Trends Optim. 1(3):127–239.Crossref, Google Scholar
Penot J-P (1998) Proximal mappings. J. Approx. Theory 94(2):203–221.Crossref, Google Scholar
Polyak B (1987) Introduction to Optimization (Optimization Software, Inc, Dallas).Google Scholar
Rockafellar RT, Wets RJ-B (2009) Variational Analysis (Springer, Berlin).Google Scholar
Roth K, Lucchi A, Nowozin S, Hofmann T (2017) Stabilizing training of generative adversarial networks through regularization. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 2018–2028.Google Scholar
Rujeerapaiboon N, Kuhn D, Wiesemann W (2016) Robust growth-optimal portfolios. Management Sci. 62(7):2090–2109.Link, Google Scholar
Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
Shafieezadeh-Abadeh S, Mohajerin Esfahani P, Kuhn D (2015) Distributionally robust logistic regression. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 1576–1584.Google Scholar
Shafieezadeh-Abadeh S, Nguyen VA, Kuhn D, Mohajerin Esfahani P (2018) Wasserstein distributionally robust Kalman filtering. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 8474–8483.Google Scholar
Shapiro A, Dentcheva D, Ruszczyński A (2014) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
Shen H, Jiang R (2022) Chance-constrained set covering with Wasserstein ambiguity. Math. Programming 198:621–674.Crossref, Google Scholar
Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2013) Intriguing properties of neural networks. Preprint, submitted December 21, https://doi.org/10.48550/arXiv.1312.6199.Google Scholar
Taşkesen B, Shafieezadeh-Abadeh S, Kuhn D (2023) Semi-discrete optimal transport: Hardness, regularization and numerical solution. Math. Programming 199(1):1033–1106.Crossref, Google Scholar
Tu Z, Zhang J, Tao D (2019) Theoretical analysis of adversarial learning: A minimax approach. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 12280–12290.Google Scholar
Van Parys BP, Mohajerin Esfahani P, Kuhn D (2021) From data to decisions: Distributionally robust optimization is optimal. Management Sci. 67(6):3387–3402.Link, Google Scholar
Varga D, Csiszárik A, Zombori Z (2018) Gradient regularization improves accuracy of discriminative models. Schedae Informaticae 27:31–45.Crossref, Google Scholar
Villani C (2008) Optimal Transport: Old and New (Springer, Berlin).Google Scholar
Volpi R, Namkoong H, Sener O, Duchi J, Murino V, Savarese S (2018) Generalizing to unseen domains via adversarial data augmentation. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 5339–5349.Google Scholar
Wang Y, Ma X, Bailey J, Yi J, Zhou B, Gu Q (2019) On the convergence and robustness of adversarial training. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 6586–6595.Google Scholar
Xie W (2021) On distributionally robust chance constrained programs with Wasserstein distance. Math. Programming 186(1):115–155.Crossref, Google Scholar
Yang I (2020) Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Trans. Automatic Control 66(8):3863–3870.Crossref, Google Scholar
Yue M-C, Kuhn D, Wiesemann W (2022) On linear optimization over Wasserstein balls. Math. Programming 195(1–2):1107–1122.Crossref, Google Scholar
Zhang L, Yang J, Gao R (2024) A short and general duality proof for Wasserstein distributionally robust optimization. Oper. Res. 73(4):2146–2155.Link, Google Scholar
Zhao C, Guan Y (2018) Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2):262–267.Crossref, Google Scholar
Zhen J, Kuhn D, Wiesemann W (2023) A unified theory of robust and distributionally robust optimization via the primal-worst-equals-dual-best principle. Oper. Res. 73(2):862–878.Link, Google Scholar
Zorzi M (2016) Robust Kalman filtering under model perturbations. IEEE Trans. Automatic Control 62(6):2902–2907.Crossref, Google Scholar
Zorzi M (2017) On the robustness of the Bayes and Wiener estimators under model uncertainty. Automatica 83:133–140.Crossref, Google Scholar

Volume 74, Issue 3

May-June 2026

Pages v-x, 1153-1728, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:March 22, 2023
Accepted:September 30, 2025
Published Online:December 19, 2025

Cite as

Soroosh Shafiee, Liviu Aolaritei, Florian Dörfler, Daniel Kuhn (2025) Nash Equilibria, Regularization, and Computation in Optimal Transport-Based Distributionally Robust Optimization. Operations Research 74(3):1689-1709.

https://doi.org/10.1287/opre.2023.0138

Keywords

Acknowledgments

The authors express sincere gratitude to the reviewers for their invaluable feedback and insightful comments on the work. The code and data to support the numerical experiments in this paper are available both in the electronic companion and at https://github.com/sorooshafiee/regularization_with_OT.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Nash Equilibria, Regularization, and Computation in Optimal Transport-Based Distributionally Robust Optimization

References

Volume 74, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News