Sinkhorn Distributionally Robust Optimization
References
- (2012) Price of correlations in stochastic optimization. Oper. Res. 60(1):150–162.Link, Google Scholar
- (2017) Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. Guyon I, Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1–11.Google Scholar
- (2007) Stochastic Simulation: Algorithms and Analysis, vol. 57 (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2023) Regularization for Wasserstein distributionally robust optimization. ESAIM: Control, Optimisation and Calculus of Variations, vol. 29 (EDP Sciences, Les Ulis, France), 33.Crossref, Google Scholar
- (1965) Estimating nonnegative matrices from marginal data. Internat. Econom. Rev. 6(3):294–310.Crossref, Google Scholar
- (2020) Information constrained optimal transport: From Talagrand, to Marton, to Cover. 2020 IEEE Internat. Sympos. Inform. Theory (IEEE Press, Piscataway, NJ), 2210–2215.Google Scholar
- (2015) Data-driven stochastic programming using phi-divergences. INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 1–19.Link, Google Scholar
- (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
- (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
- (2006) Persistence in discrete optimization under data uncertainty. Math. Programming 108(2):251–274.Crossref, Google Scholar
- (2019) Adaptive distributionally robust optimization. Management Sci. 65(2):604–618.Link, Google Scholar
- (1963) Non-existence of everywhere proper conditional distributions. Ann. Math. Statist. 34(1):223–225.Crossref, Google Scholar
- (2015) Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization. 2015 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3656–3667.Google Scholar
- (2020) Semi-supervised learning based on distributionally robust optimization. Data Analysis and Applications 3: Computational, Classification, Financial, Statistical and Stochastic Methods, vol. 5 (John Wiley & Sons, Ltd, Hoboken, NJ), 1–33.Crossref, Google Scholar
- (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
- (2022a) Distributionally robust mean-variance portfolio selection with Wasserstein distances. Management Sci. 68(9):6382–6410.Link, Google Scholar
- (2019a) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
- (2021) Statistical analysis of Wasserstein distributionally robust estimators. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications (INFORMS, Catonsville, MD), 227–254.Link, Google Scholar
- (2022b) Confidence regions in Wasserstein distributionally robust estimation. Biometrika 109(2):295–315.Crossref, Google Scholar
- (2022c) Optimal transport-based distributionally robust optimization: Structural properties and iterative schemes. Math. Oper. Res. 47(2):1500–1529.Link, Google Scholar
- (2019b) Multivariate distributionally robust convex regression under absolute error loss. Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–10.Google Scholar
- (2019) Selecting optimal decisions via distributionally robust nearest-neighbor regression. Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–11.Google Scholar
- (2020) Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems. Comput. Optim. Appl. 78(1):205–238.Crossref, Google Scholar
- (2022) Data-driven chance constrained programs over Wasserstein balls. Oper. Res. 72(1):410–424.Link, Google Scholar
- (2019) Distributionally robust optimization with infinitely constrained ambiguity sets. Oper. Res. 67(5):1328–1344.Link, Google Scholar
- (2019) Cooperative data-driven distributionally robust optimization. IEEE Trans. Automatic Control 65(10):4400–4407.Crossref, Google Scholar
- (2011) An analysis of single-layer networks in unsupervised feature learning. Gordon G, Dunson D, Dudík M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist., vol. 77 (PMLR, New York), 215–223.Google Scholar
- (2016) Geometric median in nearly linear time. Proc. 48th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 9–21.Google Scholar
- (2014) Domain adaptation with regularized optimal transport. Calders T, Esposito F, Hüllermeier E, Meo R, eds. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases, ECML PKDD 2014, Lecture Notes in Computer Science, vol. 8724 (Springer, Berlin), 274–289.Google Scholar
- (2017) Joint distribution optimal transportation for domain adaptation. Guyon I, Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1–10.Google Scholar
- (2016) Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Machine Intelligence 39(9):1853–1865.Crossref, Google Scholar
- (2006) Elements of Information Theory (Wiley-Interscience, Hoboken, NJ).Google Scholar
- (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2292–2300.Google Scholar
- (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595–612.Link, Google Scholar
- (1940) On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11(4):427–444.Crossref, Google Scholar
- (2012) On the complexity of nonoverlapping multivariate marginal bounds for probabilistic combinatorial optimization problems. Oper. Res. 60(1):138–149.Link, Google Scholar
- (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.Link, Google Scholar
- (2020) Robust risk aggregation with neural networks. Math. Finance 30(4):1229–1272.Crossref, Google Scholar
- (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1):115–166.Crossref, Google Scholar
- (2018) Model risk measurement under Wasserstein distance. Preprint, submitted September 11, https://arxiv.org/abs/1809.03641.Google Scholar
- (1960) Sur les tableaux dont les marges et des bornes sont données. Revue de L’Institut Internat. de Statistique 28(1/2):10–32.Crossref, Google Scholar
- (2022) Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. Oper. Res. 71(6):2291–2306.Link, Google Scholar
- (2022) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.Link, Google Scholar
- (2022) Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 72(3):1177–1191.Link, Google Scholar
- (2018) Learning generative models with Sinkhorn divergences. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1608–1617.Google Scholar
- (2016) Stochastic optimization for large-scale optimal transport. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 1–9.Google Scholar
- (2010) Distributionally robust optimization and its tractable approximations. Oper. Res. 58(4):902–917.Link, Google Scholar
- (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://arxiv.org/abs/1412.6572.Google Scholar
- (1990) Applied Nonparametric Regression (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2016) Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 770–778.Google Scholar
- (2017) Adversarial example defense: Ensembles of weak defenses are not strong. WOOT'17 Proc. 11th USENIX Conf. Offensive Technologies (USENIX Association, Berkeley, CA), 15.Google Scholar
- (2020a) Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30(3):2103–2133.Crossref, Google Scholar
- (2021) On the bias-variance-cost tradeoff of stochastic optimization. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 1–13.Google Scholar
- (2020b) Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 1–12.Google Scholar
- (2012) Kullback-Leibler divergence constrained distributionally robust optimization. Optimization Online (November 23), https://optimization-online.org/2012/11/3677/.Google Scholar
- (2021) A Riemannian block coordinate descent method for computing the projection robust Wasserstein distance. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 4446–4455.Google Scholar
- (1997) Foundations of Modern Probability, vol. 2 (Springer, New York).Google Scholar
- (2009) Learning multiple layers of features from tiny images. https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf.Google Scholar
- (1937) Telefoonverkeersrekening. De Ingenieur 52:15–25.Google Scholar
- (2025) Distributionally robust optimization. Acta Numerica 34:579–804.Crossref, Google Scholar
- Le Y, Yang XS (2014) TinyImageNet visual recognition challenge. https://cs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf.Google Scholar
- (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
- (2020) Large-scale methods for distributionally robust optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 8847–8860.Google Scholar
- (2022) On the efficiency of entropic regularized algorithms for optimal transport. J. Machine Learn. Res. 23(137):1–42.Google Scholar
- (2020) Projection robust Wasserstein distance and Riemannian optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 9383–9397.Google Scholar
- (2021) Discrete approximation scheme in distributionally robust optimization. Numerical Math. Theory Methods Appl. 14(2):285–320.Crossref, Google Scholar
- (2018) Differential properties of Sinkhorn approximation for learning with Wasserstein distance. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 1–12.Google Scholar
- (2019) Decomposition algorithm for distributionally robust optimization using Wasserstein metric with an application to a class of regression models. Eur. J. Oper. Res. 278(1):20–35.Crossref, Google Scholar
- (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (OpenReview.net), 1–23.Google Scholar
- (2020) Online Sinkhorn: Optimal transport distances from sample streams. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 1657–1667.Google Scholar
- (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 2208–2216.Google Scholar
- (2009) Persistency model and its applications in choice modeling. Management Sci. 55(3):453–469.Link, Google Scholar
- (1983) Problem Complexity and Method Efficiency in Optimization (John Wiley & Sons, Chichester, UK).Google Scholar
- (1994) Interior-Point Polynomial Algorithms in Convex Programming (SIAM, Philadelphia).Crossref, Google Scholar
- (2020) Robust Bayesian classification using an optimistic score ratio. Daumé H III, Singh A, eds. Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 7327–7337.Google Scholar
- (2024) Robustifying conditional portfolio decisions via optimal transport. Oper. Res., ePub ahead of print November 4, https://doi.org/10.1287/opre.2021.0243.Link, Google Scholar
- (2016a) Distillation as a defense to adversarial perturbations against deep neural networks. 2016 IEEE Sympos. Security Privacy (IEEE, Piscataway, NJ), 582–597.Google Scholar
- (2017) Practical black-box attacks against machine learning. Proc. 2017 ACM Asia Conf. Comput. Comm. Security (Association for Computing Machinery, New York), 506–519.Google Scholar
- (2016b) The limitations of deep learning in adversarial settings. 2016 IEEE Eur. Sympos. Security Privacy (IEEE, Piscataway, NJ), 372–387.Google Scholar
- (2020) Sinkhorn autoencoders. Adams R, Gogate V, eds. Uncertainty in Artificial Intelligence, vol. 115 (PMLR, New York), 733–743.Google Scholar
- (2018) On the regularization of Wasserstein GANs. Internat. Conf. Learn. Representations (OpenReview.net), 1–24.Google Scholar
- (2019) Computational optimal transport: With applications to data science. Foundations Trends Machine Learn. 11(5–6):355–607.Crossref, Google Scholar
- (2007) Ambiguity in portfolio selection. Quant. Finance 7(4):435–442.Crossref, Google Scholar
- (2021) Mathematical foundations of distributionally robust multistage optimization. SIAM J. Optim. 31(4):3044–3067.Crossref, Google Scholar
- (2005) A semidefinite programming approach to optimal-moment bounds for convex classes of distributions. Math. Oper. Res. 30(3):632–657.Link, Google Scholar
- (2025) Stochastic constrained DRO with a complexity independent of sample size. Trans. Machine Learn. Res. Forthcoming.Google Scholar
- (1999) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
- (2018) Towards robust deep neural networks with bang. 2018 IEEE Winter Conf. Appl. Comput. Vision (IEEE, Piscataway, NJ), 803–811.Google Scholar
- (1957) A min-max solution of an inventory problem. Studies in the Mathematical Theory of Inventory and Production (Stanford University Press, Redwood City, CA).Google Scholar
- (2022) Wasserstein logistic regression with mixed features. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Inc., Red Hook, NY), 1–14.Google Scholar
- (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
- (2015) Distributionally robust logistic regression. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Inc., Red Hook, NY), 1–9.Google Scholar
- (2023) New perspectives on regularization and computation in optimal transport-based distributionally robust optimization. Preprint, submitted March 7, https://arxiv.org/abs/2303.03900.Google Scholar
- (2001) On duality theory of conic linear problems. Semi-Infinite Programming (Springer, Boston), 135–165.Crossref, Google Scholar
- (2023) Bayesian distributionally robust optimization. SIAM J. Optim. 33(2):1279–1304.Crossref, Google Scholar
- (2021) Distributionally robust profit opportunities. Oper. Res. Lett. 49(1):121–128.Crossref, Google Scholar
- (2022) Tight bounds for a class of data-driven distributionally robust risk measures. Appl. Math. Optim. 85(1):1–41.Crossref, Google Scholar
- (2018) Certifiable distributional robustness with principled adversarial training. Internat. Conf. Learn. Representations (OpenReview.net), 1–34.Google Scholar
- (1964) A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Statist. 35(2):876–879.Crossref, Google Scholar
- (2025) Provably convergent policy optimization via metric-aware trust region methods. Trans. Machine Learn. Res. Forthcoming.Google Scholar
- (2019) Distributionally robust optimization and generalization in kernel methods. Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 9134–9144.Google Scholar
- (2018) Ensemble adversarial training: Attacks and defenses. Internat. Conf. Learn. Representations (OpenReview.net), 1–20.Google Scholar
- (1995) Semidefinite programming. SIAM Rev. 38(1):49–95.Crossref, Google Scholar
- (2015) Generalized Gauss inequalities via semidefinite programming. Math. Programming 156(1–2):271–302.Crossref, Google Scholar
- (2022a) Two-sample test with kernel projected Wasserstein distance. Camps-Valls G, Ruiz F, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 8022–8055.Google Scholar
- (2022b) Reliable off-policy evaluation for reinforcement learning. Oper. Res. 72(2):699–716.Google Scholar
- (2015) Likelihood robust optimization for data-driven problems. Comput. Management Sci. 13(2):241–261.Crossref, Google Scholar
- (2018) Risk-based distributionally robust optimal power flow with dynamic line rating. IEEE Trans. Power Systems 33(6):6074–6086.Crossref, Google Scholar
- (2014) Distributionally robust convex optimization. Oper. Res. 62(6):1358–1376.Link, Google Scholar
- (2012) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1):21–47.Crossref, Google Scholar
- (2019) On distributionally robust chance constrained programs with Wasserstein distance. Math. Programming 186(1):115–155.Google Scholar
- (2017) A convex optimization approach to distributionally robust Markov decision processes with Wasserstein distance. IEEE Control Syst. Lett. 1(1):164–169.Crossref, Google Scholar
- (2020) Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Trans. Automatic Control 66(8):3863–3870.Crossref, Google Scholar
- (2022) Fast distributionally robust learning with variance-reduced min-max optimization. Camps-Valls G, Ruiz F, Valera I, eds. Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 1219–1250.Google Scholar
- (1912) On the methods of measuring association between two attributes. J. Roy. Statist. Soc. 75(6):579–652.Crossref, Google Scholar
- (2018) Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2):262–267.Crossref, Google Scholar
- (2021) Kernel distributionally robust optimization: Generalized duality theorem and stochastic approximation. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., vol. 130 (PMLR, New York), 280–288.Google Scholar
- (2013) Distributionally robust joint chance constraints with second-order moment information. Math. Programming 137(1):167–198.Crossref, Google Scholar

