Distributionally Robust Losses for Latent Covariate Mixtures

Published Online:https://doi.org/10.1287/opre.2022.2363

While modern large-scale data sets often consist of heterogeneous subpopulations—for example, multiple demographic groups or multiple text corpora—the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst case procedure learns models that do well against unseen subpopulations.

Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2363.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.