The Sense and Non-Sense of Holdout Sample Validation in the Presence of Endogeneity

Published Online:https://doi.org/10.1287/mksc.1110.0666

Market response models based on field-generated data need to address potential endogeneity in the regressors to obtain consistent parameter estimates. Another requirement is that market response models predict well in a holdout sample. With both requirements combined, it may seem reasonable to subject an endogeneity-corrected model to a holdout prediction task, and this is quite common in the academic marketing literature. One may be inclined to expect that the consistent parameter estimates obtained via instrumental variables (IV) estimation predict better than the biased ordinary least squares (OLS) estimates. This paper shows that this expectation is incorrect. That is, if the holdout sample is similar to the estimation sample so that the regressors are endogenous in both samples, holdout sample validation favors regression estimates that are not corrected for endogeneity (i.e., OLS) over estimates that are corrected for endogeneity (i.e., IV estimation). We also discuss ways in which holdout samples may be used sensibly in the presence of endogeneity. A key takeaway is that if consistent parameter estimates are the primary model objective, the model should be validated with an exogenous (rather than endogenous) holdout sample.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.