Cross-Validating Regression Models in Marketing Research
Abstract
In this paper, a formal test on prediction errors is developed for the cross-validation of regression models under the simple random splitting framework. Analytic as well as simulation results relate the statistical power of the test to the allocation of sample observations to estimation and validation subsets. The results indicate that splitting the data into halves is suboptimal. More observations should be used for estimation than validation. Furthermore, the proportion of the sample optimally devoted to validation is small for very limited samples (N < 20), increases to about 40% for medium-sized samples and decreases again for large samples (N > 60). However, although the 50/50 split is suboptimal, it is not tremendously so in a wide variety of circumstances.

