Diagnosing Model Performance Under Distribution Shift

Published Online:https://doi.org/10.1287/opre.2023.0217

Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, which we call distribution shift decomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for (1) an increase in harder but frequently seen examples from training, (2) changes in the relationship between features and outcomes, and (3) poor performance on examples infrequent or unseen during training. Empirically, we demonstrate how our method can (1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data and (2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.

Funding: T. (T.) Cai was supported by the National Science Foundation Graduate Research Fellowship [Grant DGE-2036197]. H. Namkoong was partially supported by the Amazon Research Award.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0217.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.