Cross-Validating Regression Models in Marketing Research

Published Online:https://doi.org/10.1287/mksc.12.4.415

In this paper, a formal test on prediction errors is developed for the cross-validation of regression models under the simple random splitting framework. Analytic as well as simulation results relate the statistical power of the test to the allocation of sample observations to estimation and validation subsets. The results indicate that splitting the data into halves is suboptimal. More observations should be used for estimation than validation. Furthermore, the proportion of the sample optimally devoted to validation is small for very limited samples (N < 20), increases to about 40% for medium-sized samples and decreases again for large samples (N > 60). However, although the 50/50 split is suboptimal, it is not tremendously so in a wide variety of circumstances.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.