Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges

Duncan Simester
Corresponding Author
Duncan Simester
https://orcid.org/0000-0003-2758-0116
Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
Search for more papers by this author
,
Artem Timoshenko
Artem Timoshenko
https://orcid.org/0000-0002-5431-2136
Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
Search for more papers by this author
,
Spyros I. Zoumpoulis
Spyros I. Zoumpoulis
https://orcid.org/0000-0002-4662-7787
Decision Sciences, INSEAD, 77300 Fontainebleau, France
Search for more papers by this author

Duncan Simester

Corresponding Author

Duncan Simester

https://orcid.org/0000-0003-2758-0116

Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;

Search for more papers by this author

Artem Timoshenko

https://orcid.org/0000-0002-5431-2136

Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;

Search for more papers by this author

Spyros I. Zoumpoulis

https://orcid.org/0000-0002-4662-7787

Decision Sciences, INSEAD, 77300 Fontainebleau, France

Search for more papers by this author

Published Online:15 Nov 2019https://doi.org/10.1287/mnsc.2019.3308

References

Abhishek V, Hosanagar K, Fader PS (2015) Aggregation bias in sponsored search data: The curse and the cure. Marketing Sci. 34(1):59–77.Link, Google Scholar
Aigner DJ, Goldfeld SM (1974) Estimation and prediction from aggregate data when aggregates are measured more accurately than their components. Econometrica 42(January):113–134.Crossref, Google Scholar
Aigner DJ, Goldfeld SM (1973) Simulation and aggregation: A reconsideration. Rev. Econ. Stat. 55(February):114–118.Crossref, Google Scholar
Alaiz-Rodríguez R, Japkowicz N (2008) Assessing the impact of changing environments on classifier performance. Bergler S, ed. Adv. Artificial Intelligence (AI 2008), Lecture Notes in Computer Science, vol. 5032 (Springer, Berlin), 13–24.Crossref, Google Scholar
Andrews RL, Currim IS, Leeflang PSH (2011) A comparison of sales response predictions from demand models applied to store-level versus panel data. J. Bus. Econom. Statist. 29(2):319–326.Crossref, Google Scholar
Auer P, Warmuth MK (1998) Tracking the best disjunction. Machine Learn. 32(2):127–150.Crossref, Google Scholar
Baldi P, Brunak S (2001) Bioinformatics: The Machine Learning Approach (MIT Press, Cambridge, MA).Google Scholar
Batuwita R, Palade V (2010) Efficient resampling methods for training support vector machines with imbalanced datasets. 2010 Internat. Joint Conf. Neural Networks (IJCNN) (IEEE, Piscataway, NJ), 1–8.Crossref, Google Scholar
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. Working paper, University of Southern California, Los Angeles.Google Scholar
Biau G, Gyorfi L (2005) On the asymptotic properties of a nonparametric l/sub 1/-test statistic of homogeneity. IEEE Trans. Inform. Theory. 51(11):3965–3973.Crossref, Google Scholar
Bickel S, Scheffer T (2007) Dirichlet-enhanced spam filtering based on biased samples. Advances in Neural Information Processing Systems, vol. 19 (Curran Associates, Red Hook, NY), 161–168.Google Scholar
Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57.Crossref, Google Scholar
Busse MR, Simester DI, Zettelmeyer F (2010) “The best price you’ll ever get”: The 2005 employee discount pricing promotions in the U.S. automobile industry. Marketing Sci. 29(2):268–290.Link, Google Scholar
Caputo B, Sim K, Furesjo F, Smola A (2002) Appearance-based object recognition using SVMs: Which kernel should I use? Proc. NIPS Workshop Stat. Methods Comput. Experiments Visual Processing Comput. Vision (Whistler, British Columbia) vol. 2002.Google Scholar
Castillo G, Gama J, Breda AM (2003) Adaptive Bayes for a student modeling prediction task based on learning styles. Brusilovsky P, Corbett A, de Rosis F, eds. Internat. Conf. User Modeling (Springer, Berlin), 328–332.Crossref, Google Scholar
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W (2017) Double/debiased/Neyman machine learning of treatment effects. Amer. Econom. Rev. 107(5):261–265.Crossref, Google Scholar
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and causal parameters. Econom. J. 21(1):C1–C68.Crossref, Google Scholar
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence Medicine 34(2):113–127.Crossref, Google Scholar
Dietterich TG, Widmer G, Kubat M, eds. (1998) Special issue on context sensitivity and concept drift. Machine Learn. 32(2).Google Scholar
Direct Marketing Association (2005) Statistical Fact Book (Direct Marketing Association, New York).Google Scholar
Edwards JB, Orcutt GH (1969) Should estimation prior to aggregation be the rule? Rev. Econ. Stat. 51(November):409–420.Crossref, Google Scholar
Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J. Comput. Appl. Math. 196(2):425–436.Crossref, Google Scholar
Erdem T, Keane MP (1996) Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Sci. 15(1):1–20.Link, Google Scholar
Foekens EW, Leeflang PSH, Wittink DR (1994) A comparison and an exploration of the forecasting accuracy of a loglinear model at different levels of aggregation. Internat. J. Forecast. 10(2):245–261.Crossref, Google Scholar
Frénay B, Verleysen M (2014) Classification in the presence of label noise: A survey. IEEE Trans. Neural Networks Learn. Systems 25(5):845–869.Crossref, Google Scholar
Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7(4):697–717.Crossref, Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning, Springer Series in Statistics, vol. 1 (Springer, New York).Google Scholar
Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput. Surv. 46(4):1–37.Crossref, Google Scholar
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J. Machine Learn. Res. 13(March):723–773.Google Scholar
Grunfeld Y, Griliches Z (1960) Is aggregation necessarily bad? Rev. Econom. Statist. 42(February):1–13.Crossref, Google Scholar
Gupta S, Chintagunta P, Kaul A, Wittink DR (1996) Do household scanner data provide representative inferences from brand choices: A comparison with store data. J. Marketing Res. 33(4):383–398.Crossref, Google Scholar
Hall P, Tajvidi N (2002) Permutation tests for equality of distributions in high‐dimensional settings. Biometrika 89(2):359–374.Crossref, Google Scholar
Hand DJ (2006a) Classifier technology and the illusion of progress. Statist. Sci. 21(1):1–14.Crossref, Google Scholar
Hand DJ (2006b) Rejoinder: Classifier technology and the illusion of progress. Statist. Sci. 21(1):30–34.Crossref, Google Scholar
Harries MB, Sammut C, Horn K (1998) Extracting hidden context. Machine Learn. 32(2):101–126.Crossref, Google Scholar
Harries M, Horn K (1995) Detecting concept drift in financial time series prediction using symbolic machine learning. AI-Conference (World Scientific Publishing, Singapore), 91–98.Google Scholar
He H, Ma Y, eds. (2013) Imbalanced Learning: Foundations, Algorithms, and Applications (John Wiley & Sons, New York).Crossref, Google Scholar
Helmbold DP, Long PM (1991) Tracking drifting concepts using random examples. Warmuth MK, Valiant LG, eds. Proc. 4th Annual Workshop Comput. Learn. Theory (Morgan Kaufmann Publishers Inc., Burlington, MA), 13–23.Crossref, Google Scholar
Helmbold DP, Long PM (1994) Tracking drifting concepts by minimizing disagreements. Machine Learn. 14(1):27–45.Crossref, Google Scholar
Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.Crossref, Google Scholar
Hitsch G, Misra S (2018) Heterogeneous treatment effects and optimal targeting policy. Working paper, University of Chicago, Chicago.Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 604–613.Crossref, Google Scholar
Jain A, Zongker D (1997) Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Machine Intelligence 19(2):153–158.Crossref, Google Scholar
Kim G, Chae BK, Olson DL (2013) A support vector machine (SVM) approach to imbalanced datasets of customer responses: Comparison with other customer response models. Serv. Bus. 7(1):167–182.Crossref, Google Scholar
Klinkenberg R (2003) Predicting phases in business cycles under concept drift. Proc. LWA, 3–10.Google Scholar
Krishna A (1992) The normative impact of consumer price expectations for multiple brands on consumer purchase behavior. Marketing Sci. 11(3):266–286.Link, Google Scholar
Krishna A (1994) The impact of dealing patterns on purchase behavior. Marketing Sci. 13(4):351–373.Link, Google Scholar
Kuh A, Petsche T, Rivest RL (1992) Incrementally learning time-varying half-planes. Moody JE, Hanson SJ, Lippmann, RP, eds. Advances in Neural Information Processing Systems, vol. 4 (MIT Press, Cambridge, MA), 920–927.Google Scholar
Kuh A, Petsche T, Rivest RL (1991) Learning time-varying concepts. Lippmann RP, Moody JE, Touretzky DS, eds. Advances in Neural Information Processing Systems, vol. 3 (Curran Associates, Red Hook, NY), 183–189.Google Scholar
Kukar M (2003) Drifting concepts as hidden factors in clinical studies. Dojat M, Keravnou ET, Barahona P, eds. Conf. Artificial Intelligence Medicine Europe (Springer, Berlin), 355–364.Crossref, Google Scholar
Lane T, Brodley CE (1998) Agrawal R, Stolorz P, eds. Approaches to online learning and concept drift for user identification in computer security. KDD’98 Proc. 4th Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 259–263.Google Scholar
Li Y, Swersky K, Zemel R (2015) Generative moment matching networks. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 1718–1727.Google Scholar
McCarty JA, Hastak M (2007) Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. J. Bus. Res. 60(6):656–662.Crossref, Google Scholar
Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognition 45(1):521–530.Crossref, Google Scholar
Movellan JR, Mineiro P (1998) Robust sensor fusion: Analysis and application to audio visual speech recognition. Machine Learn. 32(2):85–100.Crossref, Google Scholar
Naik PA, Mantrala MK, Sawyer AG (1998) Planning media schedules in the presence of dynamic advertising quality. Marketing Sci. 17(3):214–235.Link, Google Scholar
Olson DL, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Service Bus. 3(2):117–130.Crossref, Google Scholar
Pechenizkiy M, Bakker J, Žliobaitė I, Ivannikov A, Kärkkäinen T (2010) Online mass flow prediction in CFB boilers with explicit detection of sudden concept drift. SIGKDD Explorations 11(2):109–116.Crossref, Google Scholar
Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. Luzar-Stiffler V, Dobric VH, eds. 28th Internat. Conf. Inform. Tech. Interfaces (IEEE, Piscataway, NJ), 213–218.Crossref, Google Scholar
Pesaran MH, Pierse RG, Kumar MS (1989) Econometric analysis of aggregation in the context of linear prediction models. Econometrica 57(4):861–888.Crossref, Google Scholar
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Machine Learn. 1(3):317–354.Crossref, Google Scholar
Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41(5):2263–2291.Crossref, Google Scholar
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2):227–244.Crossref, Google Scholar
Simester D, Timoshenko A, Zoumpoulis SI (2019) Efficiently evaluating targeting policies: Improving upon champion vs. challenger experiments. Management Sci. Forthcoming.Google Scholar
Smirnov NV (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Math. Univ. Moscou 2(2):3–14.Google Scholar
Stock JH, Watson MW (2005) An empirical comparison of methods for forecasting using many predictors. Manuscript, Princeton University, Princeton, NJ.Google Scholar
Sugiyama M, Kawanabe M (2012) Machine Learning in Non-Stationary Environments (MIT Press, Cambridge, MA).Crossref, Google Scholar
Sugiyama M, Krauledat M, Müller K-R (2007) Covariate shift adaptation by importance weighted cross validation. J. Machine Learn. Res. 8(May):985–1005.Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction, vol. 1, no. 1 (MIT Press, Cambridge, MA).Google Scholar
Tapak L, Mahjub H, Hamidi O, Poorolajal J (2013) Real-data comparison of data mining methods in prediction of diabetes in Iran. Healthcare Inform. Res. 19(3):177–185.Crossref, Google Scholar
Thompson PA, Noordewier T (1992) Estimating the effects of consumer incentive programs on domestic automobile sales. J. Bus. Econom. Statist. 10(4):409–417.Crossref, Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. B. 58(1):267–288.Crossref, Google Scholar
Tong L, Erdmann C, Daldalian M, Li J, Esposito T (2016) Comparison of predictive modeling approaches for 30-day all-cause non-elective readmission risk. BMC Medical Res. Methodology 16(1):26.Crossref, Google Scholar
Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2006) Handling local concept drift with dynamic integration of classifiers: Domain of antibiotic resistance in nosocomial infections. Lee DJ, Nutter B, Antani S, Mitra S, Archibald J, eds. 19th IEEE Internat. Sympos. Comput.-Based Medical Systems (CBMS 2006) (IEEE, Piscataway, NJ), 679–684.Crossref, Google Scholar
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empirical Software Engrg. 17(1–2):62–74.Crossref, Google Scholar
Ueki K, Sugiyama M, Ihara Y (2010) Perceived age estimation under lighting condition change by covariate shift adaption. Proc. 22nd Internat. Conf. Comput. Linguistics (IEEE, Piscataway, NJ), 897–904.Google Scholar
Vicente R, Kinouchi O, Caticha N (1998) Statistical mechanics of online learning of drifting concepts: A variational approach. Machine Learn. 32(2):179–201.Crossref, Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. Proc. 9th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 226–235.Crossref, Google Scholar
Widmer G, Kubat M (1998) Guest editors’ introduction. Machine Learn. 32:83–84.Crossref, Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Machine Learn. 23(1):69–101.Crossref, Google Scholar
Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM (2002) Brain–computer interfaces for communication and control. Clinical Neurophysiology 113(6):767–791.Crossref, Google Scholar
Wu J, Vadera S, Dayson K, Burridge D, Clough I (2010) A comparison of data mining methods in microfinance. 2nd IEEE Internat. Conf. Inform. Financial Engrg. (ICIFE) (IEEE, Piscataway, NJ), 499–502.Crossref, Google Scholar
Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition 41(12):3600–3612.Crossref, Google Scholar
Yamazaki K, Kawanabe M, Watanabe S, Sugiyama M, Müller K-R (2007) Asymptotic Bayesian generalization error when training and test distributions are different. Ghahramani Z, ed. Proc. 24th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 1079–1086.Crossref, Google Scholar
Yang L, Jin R (2006) Distance metric learning: A comprehensive survey. Manuscript, Michigan State University, East Lansing.Google Scholar
Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. Proc. 21st Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 114.Crossref, Google Scholar

Volume 66, Issue 6

June 2020

Pages 2291-2799, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:October 11, 2016
Accepted:January 04, 2019
Published Online:November 15, 2019

Cite as

Duncan Simester, Artem Timoshenko, Spyros I. Zoumpoulis (2019) Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges. Management Science 66(6):2495-2522.

https://doi.org/10.1287/mnsc.2019.3308

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges

References

Volume 66, Issue 6

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News