Distributionally Robust Losses for Latent Covariate Mixtures

John Duchi
John Duchi
[email protected]
Departments of Electrical Engineering and Statistics, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Tatsunori Hashimoto
Tatsunori Hashimoto
[email protected]
Department of Computer Science, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Hongseok Namkoong
Corresponding Author
Hongseok Namkoong
[email protected]
https://orcid.org/0000-0002-5708-4044
Decision, Risk, and Operations Division, Columbia Business School, New York, New York 10027
Search for more papers by this author

John Duchi

[email protected]

Departments of Electrical Engineering and Statistics, Stanford University, Stanford, California 94305;

Search for more papers by this author

Tatsunori Hashimoto

[email protected]

Department of Computer Science, Stanford University, Stanford, California 94305;

Search for more papers by this author

Hongseok Namkoong

Corresponding Author

Hongseok Namkoong

[email protected]

https://orcid.org/0000-0002-5708-4044

Decision, Risk, and Operations Division, Columbia Business School, New York, New York 10027

Search for more papers by this author

Published Online:2 Sep 2022https://doi.org/10.1287/opre.2022.2363

References

Adebayo JA (2016) Fairml: Toolbox for diagnosing bias in predictive modeling. Unpublished master’s thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. Proc. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G (2016) Deep speech 2: End-to-end speech recognition in English and Mandarin. Proc. 33rd Internat. Conf. Machine Learn. (ACM, New York), 173–182.Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar
Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev. 104(3):671–732.Google Scholar
Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. Adv. Neural Inform. Processing Systems 20:137–144.Crossref, Google Scholar
Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust Optimization(Princeton University Press, Princeton, NJ).Crossref, Google Scholar
Ben-Tal A, den Hertog D, Waegenaere AD, Melenberg B, Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
Berlinet A, Thomas-Agnan C (2004) Reproducing Kernel Hilbert Spaces in Probability and Statistics (Kluwer Academic Publishers, Amsterdam).Crossref, Google Scholar
Bertsimas D, Gupta V, Kallus N (2018) Data-driven robust optimization. Math. Programming Ser. A 167(2):235–292.Crossref, Google Scholar
Bickel S, Brückner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York).Google Scholar
Blanchet J, Kang Y, Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
Blanchet J, Kang Y, Zhang F, Murthy K (2017) Data-driven optimal transport cost selection for distributionally robust optimization. Preprint, submitted May 19, https://arxiv.org/abs/1705.07152.Google Scholar
Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: A case study of African-American English. Proc. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1119–1130.Google Scholar
Bühlmann P, Meinshausen N (2016) Magging: Maximin aggregation for inhomogeneous large-scale data. Proc. IEEE 104(1):126–135.Crossref, Google Scholar
Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L (2017) Semeval-2017 task 1: Semantic textual similarity multilingual and cross-lingual focused evaluation. Proc. 10th Internat. Workshop Semantic Evaluation.Google Scholar
Chen J, Kallus N, Mao X, Svacha G, Udell M (2019) Fairness under unawareness: Assessing disparity when protected class is unobserved. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 339–348.Google Scholar
Cheng C, Asi H, Duchi J (2022) How many labelers do you have? A closer look at gold-standard labels. Preprint, submitted June 24, https://arxiv.org/abs/2206.12041.Google Scholar
Chouldechova A (2017) A study of bias in recidivism prediction instruments. Big Data 5(2):153–163.Crossref, Google Scholar
Consumer Financial Protection Bureau (2014) Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment. https://www.consumerfinance.gov/data-research/research-reports/usingpublicly-available-information-to-proxy-for-unidentified-race-and-ethnicity/.Google Scholar
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4):547–553.Crossref, Google Scholar
Cristianini N, Shawe-Taylor J (2004) Kernel Methods for Pattern Analysis (Cambridge University Press).Google Scholar
Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust optimization. Ann. Statist. 49(3):1378–1406.Crossref, Google Scholar
Duchi JC, Glynn PW, Namkoong H (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.Link, Google Scholar
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. Innovations Theoretical Comput. Sci., 214–226.Crossref, Google Scholar
Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming Ser. A 171(1–2):115–166.Crossref, Google Scholar
Fournier N, Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4):707–738.Crossref, Google Scholar
Gao R, Kleywegt AJ (2016) Distributionally robust stochastic optimization with Wasserstein distance. Preprint, submitted April 8, https://arxiv.org/abs/1604.02199.Google Scholar
Gao R, Chen X, Kleywegt A (2017) Wasserstein distributional robustness and regularization in statistical learning. Preprint, submitted December 17, https://arxiv.org/abs/1712.06050.Google Scholar
Gong M, Zhang K, Liu T, Tao D, Glymour C, Schölkopf B (2016) Domain adaptation with conditional transferable components. Proc. 33rd Internat. Conf. Machine Learn. (ACM, New York), 2839–2848.Google Scholar
Gretton A, Smola A, Huang J, Schmittfull M, Borgwardt K, Schölkopf B (2009) Covariate shift by kernel mean matching. Quiñonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND, eds. Dataset Shift in Machine Learning (MIT Press, Cambridge, MA), 131–160.Google Scholar
Grother PJ, Quinn GW, Phillips PJ (2010) Report on the evaluation of 2D still-image face recognition algorithms. NIST Interagency/Internal Report 7709, National Institute of Standards and Technology, Gaithersburg, MD.Google Scholar
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv. Neural Inform. Processing Systems 29.Google Scholar
Hashimoto T, Srivastava M, Namkoong H, Liang P (2018) Fairness without demographics in repeated loss minimization. Proc. 35th Internat. Conf. Machine Learn (ACM, New York).Google Scholar
Hébert-Johnson Ú, Kim MP, Reingold O, Rothblum GN (2017) Calibration for the (computationally identifiable) masses. Preprint, submitted November 22, https://arxiv.org/abs/1711.08513.Google Scholar
Heinze-Deml C, Meinshausen N (2017) Grouping-by-ID: Guarding against adversarial domain shifts.Google Scholar
Hovy D, Søgaard A (2015) Tagging performance correlates with author age. Proc. 53rd Annual Meeting Assoc. Comput. Linguistics (Short Papers) (Association for Computational Linguistics, Stroudsburg, PA), vol. 2, 483–488.Google Scholar
Hu W, Niu G, Sato I, Sugiayma M (2018) Does distributionally robust supervised learning give robust classifiers? Proc. 35th Internat. Conf. Machine Learn. (ACM, New York).Google Scholar
Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ (2007) Correcting sample selection bias by unlabeled data. Adv. Neural Inform. Processing Systems 20: 601–608.Google Scholar
Imbens G, Rubin D (2015) Causal Inference for Statistics, Social, and Biomedical Sciences (Cambridge University Press, New York).Crossref, Google Scholar
Kearns M, Neel S, Roth A, Wu ZS (2018) Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. Preprint, submitted November 14, 2017, https://arxiv.org/abs/1711.05144.Google Scholar
Kilbertus N, Carulla MR, Parascandolo G, Hardt M, Janzing D, Schölkopf B (2017) Avoiding discrimination through causal reasoning. Adv. Neural Inform. Processing Systems 30:656–666.Google Scholar
Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS), 130–166.Link, Google Scholar
Lam H, Qian H (2019) Combating conservativeness in data-driven optimization under uncertainty: A solution path approach. Preprint, submitted September 13, https://arxiv.org/abs/1909.06477.Google Scholar
Lam H, Zhou E (2015) Quantifying input uncertainty in stochastic optimization. Proc. 2015 Winter Simulation Conf. (IEEE, Piscataway, NJ).Google Scholar
Lee J, Raginsky M (2017) Minimax statistical learning and domain adaptation with Wasserstein distances. Preprint, submitted May 22, https://arxiv.org/abs/1705.07815.Google Scholar
Liu A, Ziebart B (2014) Robust classification under sample selection bias. Adv. Neural Inform. Processing Systems 27:37–45.Google Scholar
Liu A, Ziebart B (2017) Robust covariate shift prediction with general losses and feature views. Preprint, submitted December 28, https://arxiv.org/abs/1712.10043.Google Scholar
Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank. Comput. Linguistics 19(2):313–330.Google Scholar
Meinshausen N, Bühlmann P (2015) Maximin effects in inhomogeneous large-scale data. Ann. Statist. 43(4):1801–1830.Crossref, Google Scholar
Miyato T, Maeda S-i, Koyama M, Nakae K, Ishii S (2015) Distributional smoothing with virtual adversarial training. Preprint, submitted July 2, https://arxiv.org/abs/1507.00677.Google Scholar
Namkoong H, Duchi JC (2017) Variance regularization with convex objectives. Adv. Neural Inform. Processing Systems 30:2975–2984.Google Scholar
Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. Proc. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Peters J, Bühlmann P, Meinshausen N (2016) Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc. B 78(5):947–1012.Crossref, Google Scholar
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for squad. Proc. Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
Rothenhäusler D, Meinshausen N, Bühlmann P (2016) Confidence intervals for maximin effects in inhomogeneous large-scale data. Statistical Analysis for High-Dimensional Data (Springer), 255–277.Crossref, Google Scholar
Rothenhäusler D, Bühlmann P, Meinshausen N, Peters J (2018) Anchor regression: Heterogeneous data meets causality. Preprint, submitted January 18, https://arxiv.org/abs/1801.06229.Google Scholar
Sapiezynski P, Kassarnig V, Wilson C (2017) Academic performance prediction in a gender-imbalanced environment. Proc. 11th ACM Conf. Recommender Systems (ACM, New York), vol. 1, 48–51.Google Scholar
Shafieezadeh-Abadeh S, Esfahani PM, Kuhn D (2015) Distributionally robust logistic regression. Adv. Neural Inform. Processing Systems 28:1576–1584.Google Scholar
Shankar S, Halpern Y, Breck E, Atwood J, Wilson J, Sculley D (2017) No classification without representation: Assessing geodiversity issues in open data sets for the developing world. Preprint, submitted November 22, https://arxiv.org/abs/1711.08536.Google Scholar
Shapiro A (2017) Distributionally robust stochastic programming. SIAM J. Optim. 27(4):2258–2275.Crossref, Google Scholar
Shapiro A, Dentcheva D, Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society).Crossref, Google Scholar
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2):227–244.Crossref, Google Scholar
Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training. Proc. Sixth Internat. Conf. Learn. Representations.Google Scholar
Staib M, Jegelka S (2019) Distributionally robust optimization and generalization in kernel methods. Adv. Neural Inform. Processing Systems 32:9134–9144.Google Scholar
Storkey AJ, Sugiyama M (2006) Mixture regression for covariate shift. Adv. Neural Inform. Processing Systems 19:1337–1344.Google Scholar
Sugiyama M, Krauledat M, Müller K-R (2007) Covariate shift adaptation by importance weighted cross validation. J. Machine Learn. Res. 8(35):985–1005.Google Scholar
Tatman R (2017) Gender and dialect bias in YouTube’s automatic captions. Proc. First Workshop Ethics Natural Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 53–59.Google Scholar
van Erven T, Harremoës P (2014) Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60(7):3797–3820.Crossref, Google Scholar
Wen J, Yu C-N, Greiner R (2014) Robust learning under uncertain test distributions: Relating covariate shift to model misspecification. Proc. 31st Internat. Conf. Machine Learn. (ACM, New York), 631–639.Google Scholar

Volume 71, Issue 2

March-April 2023

Pages iii-vi, 397-790, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:July 03, 2020
Accepted:July 11, 2022
Published Online:September 02, 2022

Cite as

John Duchi, Tatsunori Hashimoto, Hongseok Namkoong (2022) Distributionally Robust Losses for Latent Covariate Mixtures. Operations Research 71(2):649-664.

https://doi.org/10.1287/opre.2022.2363

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Distributionally Robust Losses for Latent Covariate Mixtures

References

Volume 71, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News