Distributionally Robust Losses for Latent Covariate Mixtures

Published Online:https://doi.org/10.1287/opre.2022.2363

References

  • Adebayo JA (2016) Fairml: Toolbox for diagnosing bias in predictive modeling. Unpublished master’s thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. Proc. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G (2016) Deep speech 2: End-to-end speech recognition in English and Mandarin. Proc. 33rd Internat. Conf. Machine Learn. (ACM, New York), 173–182.Google Scholar
  • Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar
  • Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev. 104(3):671–732.Google Scholar
  • Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. Adv. Neural Inform. Processing Systems 20:137–144.CrossrefGoogle Scholar
  • Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust Optimization(Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Ben-Tal A, den Hertog D, Waegenaere AD, Melenberg B, Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.LinkGoogle Scholar
  • Berlinet A, Thomas-Agnan C (2004) Reproducing Kernel Hilbert Spaces in Probability and Statistics (Kluwer Academic Publishers, Amsterdam).CrossrefGoogle Scholar
  • Bertsimas D, Gupta V, Kallus N (2018) Data-driven robust optimization. Math. Programming Ser. A 167(2):235–292.CrossrefGoogle Scholar
  • Bickel S, Brückner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York).Google Scholar
  • Blanchet J, Kang Y, Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.CrossrefGoogle Scholar
  • Blanchet J, Kang Y, Zhang F, Murthy K (2017) Data-driven optimal transport cost selection for distributionally robust optimization. Preprint, submitted May 19, https://arxiv.org/abs/1705.07152.Google Scholar
  • Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: A case study of African-American English. Proc. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1119–1130.Google Scholar
  • Bühlmann P, Meinshausen N (2016) Magging: Maximin aggregation for inhomogeneous large-scale data. Proc. IEEE 104(1):126–135.CrossrefGoogle Scholar
  • Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L (2017) Semeval-2017 task 1: Semantic textual similarity multilingual and cross-lingual focused evaluation. Proc. 10th Internat. Workshop Semantic Evaluation.Google Scholar
  • Chen J, Kallus N, Mao X, Svacha G, Udell M (2019) Fairness under unawareness: Assessing disparity when protected class is unobserved. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 339–348.Google Scholar
  • Cheng C, Asi H, Duchi J (2022) How many labelers do you have? A closer look at gold-standard labels. Preprint, submitted June 24, https://arxiv.org/abs/2206.12041.Google Scholar
  • Chouldechova A (2017) A study of bias in recidivism prediction instruments. Big Data 5(2):153–163.CrossrefGoogle Scholar
  • Consumer Financial Protection Bureau (2014) Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment. https://www.consumerfinance.gov/data-research/research-reports/usingpublicly-available-information-to-proxy-for-unidentified-race-and-ethnicity/.Google Scholar
  • Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4):547–553.CrossrefGoogle Scholar
  • Cristianini N, Shawe-Taylor J (2004) Kernel Methods for Pattern Analysis (Cambridge University Press).Google Scholar
  • Duchi JC, Namkoong H (2021) Learning models with uniform performance via distributionally robust optimization. Ann. Statist. 49(3):1378–1406.CrossrefGoogle Scholar
  • Duchi JC, Glynn PW, Namkoong H (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.LinkGoogle Scholar
  • Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. Innovations Theoretical Comput. Sci., 214–226.CrossrefGoogle Scholar
  • Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming Ser. A 171(1–2):115–166.CrossrefGoogle Scholar
  • Fournier N, Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4):707–738.CrossrefGoogle Scholar
  • Gao R, Kleywegt AJ (2016) Distributionally robust stochastic optimization with Wasserstein distance. Preprint, submitted April 8, https://arxiv.org/abs/1604.02199.Google Scholar
  • Gao R, Chen X, Kleywegt A (2017) Wasserstein distributional robustness and regularization in statistical learning. Preprint, submitted December 17, https://arxiv.org/abs/1712.06050.Google Scholar
  • Gong M, Zhang K, Liu T, Tao D, Glymour C, Schölkopf B (2016) Domain adaptation with conditional transferable components. Proc. 33rd Internat. Conf. Machine Learn. (ACM, New York), 2839–2848.Google Scholar
  • Gretton A, Smola A, Huang J, Schmittfull M, Borgwardt K, Schölkopf B (2009) Covariate shift by kernel mean matching. Quiñonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND, eds. Dataset Shift in Machine Learning (MIT Press, Cambridge, MA), 131–160.Google Scholar
  • Grother PJ, Quinn GW, Phillips PJ (2010) Report on the evaluation of 2D still-image face recognition algorithms. NIST Interagency/Internal Report 7709, National Institute of Standards and Technology, Gaithersburg, MD.Google Scholar
  • Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv. Neural Inform. Processing Systems 29.Google Scholar
  • Hashimoto T, Srivastava M, Namkoong H, Liang P (2018) Fairness without demographics in repeated loss minimization. Proc. 35th Internat. Conf. Machine Learn (ACM, New York).Google Scholar
  • Hébert-Johnson Ú, Kim MP, Reingold O, Rothblum GN (2017) Calibration for the (computationally identifiable) masses. Preprint, submitted November 22, https://arxiv.org/abs/1711.08513.Google Scholar
  • Heinze-Deml C, Meinshausen N (2017) Grouping-by-ID: Guarding against adversarial domain shifts.Google Scholar
  • Hovy D, Søgaard A (2015) Tagging performance correlates with author age. Proc. 53rd Annual Meeting Assoc. Comput. Linguistics (Short Papers) (Association for Computational Linguistics, Stroudsburg, PA), vol. 2, 483–488.Google Scholar
  • Hu W, Niu G, Sato I, Sugiayma M (2018) Does distributionally robust supervised learning give robust classifiers? Proc. 35th Internat. Conf. Machine Learn. (ACM, New York).Google Scholar
  • Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ (2007) Correcting sample selection bias by unlabeled data. Adv. Neural Inform. Processing Systems 20: 601–608.Google Scholar
  • Imbens G, Rubin D (2015) Causal Inference for Statistics, Social, and Biomedical Sciences (Cambridge University Press, New York).CrossrefGoogle Scholar
  • Kearns M, Neel S, Roth A, Wu ZS (2018) Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. Preprint, submitted November 14, 2017, https://arxiv.org/abs/1711.05144.Google Scholar
  • Kilbertus N, Carulla MR, Parascandolo G, Hardt M, Janzing D, Schölkopf B (2017) Avoiding discrimination through causal reasoning. Adv. Neural Inform. Processing Systems 30:656–666.Google Scholar
  • Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS), 130–166.LinkGoogle Scholar
  • Lam H, Qian H (2019) Combating conservativeness in data-driven optimization under uncertainty: A solution path approach. Preprint, submitted September 13, https://arxiv.org/abs/1909.06477.Google Scholar
  • Lam H, Zhou E (2015) Quantifying input uncertainty in stochastic optimization. Proc. 2015 Winter Simulation Conf. (IEEE, Piscataway, NJ).Google Scholar
  • Lee J, Raginsky M (2017) Minimax statistical learning and domain adaptation with Wasserstein distances. Preprint, submitted May 22, https://arxiv.org/abs/1705.07815.Google Scholar
  • Liu A, Ziebart B (2014) Robust classification under sample selection bias. Adv. Neural Inform. Processing Systems 27:37–45.Google Scholar
  • Liu A, Ziebart B (2017) Robust covariate shift prediction with general losses and feature views. Preprint, submitted December 28, https://arxiv.org/abs/1712.10043.Google Scholar
  • Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank. Comput. Linguistics 19(2):313–330.Google Scholar
  • Meinshausen N, Bühlmann P (2015) Maximin effects in inhomogeneous large-scale data. Ann. Statist. 43(4):1801–1830.CrossrefGoogle Scholar
  • Miyato T, Maeda S-i, Koyama M, Nakae K, Ishii S (2015) Distributional smoothing with virtual adversarial training. Preprint, submitted July 2, https://arxiv.org/abs/1507.00677.Google Scholar
  • Namkoong H, Duchi JC (2017) Variance regularization with convex objectives. Adv. Neural Inform. Processing Systems 30:2975–2984.Google Scholar
  • Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. Proc. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Peters J, Bühlmann P, Meinshausen N (2016) Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc. B 78(5):947–1012.CrossrefGoogle Scholar
  • Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for squad. Proc. Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.CrossrefGoogle Scholar
  • Rothenhäusler D, Meinshausen N, Bühlmann P (2016) Confidence intervals for maximin effects in inhomogeneous large-scale data. Statistical Analysis for High-Dimensional Data (Springer), 255–277.CrossrefGoogle Scholar
  • Rothenhäusler D, Bühlmann P, Meinshausen N, Peters J (2018) Anchor regression: Heterogeneous data meets causality. Preprint, submitted January 18, https://arxiv.org/abs/1801.06229.Google Scholar
  • Sapiezynski P, Kassarnig V, Wilson C (2017) Academic performance prediction in a gender-imbalanced environment. Proc. 11th ACM Conf. Recommender Systems (ACM, New York), vol. 1, 48–51.Google Scholar
  • Shafieezadeh-Abadeh S, Esfahani PM, Kuhn D (2015) Distributionally robust logistic regression. Adv. Neural Inform. Processing Systems 28:1576–1584.Google Scholar
  • Shankar S, Halpern Y, Breck E, Atwood J, Wilson J, Sculley D (2017) No classification without representation: Assessing geodiversity issues in open data sets for the developing world. Preprint, submitted November 22, https://arxiv.org/abs/1711.08536.Google Scholar
  • Shapiro A (2017) Distributionally robust stochastic programming. SIAM J. Optim. 27(4):2258–2275.CrossrefGoogle Scholar
  • Shapiro A, Dentcheva D, Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society).CrossrefGoogle Scholar
  • Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2):227–244.CrossrefGoogle Scholar
  • Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training. Proc. Sixth Internat. Conf. Learn. Representations.Google Scholar
  • Staib M, Jegelka S (2019) Distributionally robust optimization and generalization in kernel methods. Adv. Neural Inform. Processing Systems 32:9134–9144.Google Scholar
  • Storkey AJ, Sugiyama M (2006) Mixture regression for covariate shift. Adv. Neural Inform. Processing Systems 19:1337–1344.Google Scholar
  • Sugiyama M, Krauledat M, Müller K-R (2007) Covariate shift adaptation by importance weighted cross validation. J. Machine Learn. Res. 8(35):985–1005.Google Scholar
  • Tatman R (2017) Gender and dialect bias in YouTube’s automatic captions. Proc. First Workshop Ethics Natural Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 53–59.Google Scholar
  • van Erven T, Harremoës P (2014) Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60(7):3797–3820.CrossrefGoogle Scholar
  • Wen J, Yu C-N, Greiner R (2014) Robust learning under uncertain test distributions: Relating covariate shift to model misspecification. Proc. 31st Internat. Conf. Machine Learn. (ACM, New York), 631–639.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.