Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges

Published Online:https://doi.org/10.1287/mnsc.2019.3308

References

  • Abhishek V, Hosanagar K, Fader PS (2015) Aggregation bias in sponsored search data: The curse and the cure. Marketing Sci. 34(1):59–77.LinkGoogle Scholar
  • Aigner DJ, Goldfeld SM (1974) Estimation and prediction from aggregate data when aggregates are measured more accurately than their components. Econometrica 42(January):113–134.CrossrefGoogle Scholar
  • Aigner DJ, Goldfeld SM (1973) Simulation and aggregation: A reconsideration. Rev. Econ. Stat. 55(February):114–118.CrossrefGoogle Scholar
  • Alaiz-Rodríguez R, Japkowicz N (2008) Assessing the impact of changing environments on classifier performance. Bergler S, ed. Adv. Artificial Intelligence (AI 2008), Lecture Notes in Computer Science, vol. 5032 (Springer, Berlin), 13–24.CrossrefGoogle Scholar
  • Andrews RL, Currim IS, Leeflang PSH (2011) A comparison of sales response predictions from demand models applied to store-level versus panel data. J. Bus. Econom. Statist. 29(2):319–326.CrossrefGoogle Scholar
  • Auer P, Warmuth MK (1998) Tracking the best disjunction. Machine Learn. 32(2):127–150.CrossrefGoogle Scholar
  • Baldi P, Brunak S (2001) Bioinformatics: The Machine Learning Approach (MIT Press, Cambridge, MA).Google Scholar
  • Batuwita R, Palade V (2010) Efficient resampling methods for training support vector machines with imbalanced datasets. 2010 Internat. Joint Conf. Neural Networks (IJCNN) (IEEE, Piscataway, NJ), 1–8.CrossrefGoogle Scholar
  • Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. Working paper, University of Southern California, Los Angeles.Google Scholar
  • Biau G, Gyorfi L (2005) On the asymptotic properties of a nonparametric l/sub 1/-test statistic of homogeneity. IEEE Trans. Inform. Theory. 51(11):3965–3973.CrossrefGoogle Scholar
  • Bickel S, Scheffer T (2007) Dirichlet-enhanced spam filtering based on biased samples. Advances in Neural Information Processing Systems, vol. 19 (Curran Associates, Red Hook, NY), 161–168.Google Scholar
  • Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57.CrossrefGoogle Scholar
  • Busse MR, Simester DI, Zettelmeyer F (2010) “The best price you’ll ever get”: The 2005 employee discount pricing promotions in the U.S. automobile industry. Marketing Sci. 29(2):268–290.LinkGoogle Scholar
  • Caputo B, Sim K, Furesjo F, Smola A (2002) Appearance-based object recognition using SVMs: Which kernel should I use? Proc. NIPS Workshop Stat. Methods Comput. Experiments Visual Processing Comput. Vision (Whistler, British Columbia) vol. 2002.Google Scholar
  • Castillo G, Gama J, Breda AM (2003) Adaptive Bayes for a student modeling prediction task based on learning styles. Brusilovsky P, Corbett A, de Rosis F, eds. Internat. Conf. User Modeling (Springer, Berlin), 328–332.CrossrefGoogle Scholar
  • Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W (2017) Double/debiased/Neyman machine learning of treatment effects. Amer. Econom. Rev. 107(5):261–265.CrossrefGoogle Scholar
  • Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and causal parameters. Econom. J. 21(1):C1–C68.CrossrefGoogle Scholar
  • Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence Medicine 34(2):113–127.CrossrefGoogle Scholar
  • Dietterich TG, Widmer G, Kubat M, eds. (1998) Special issue on context sensitivity and concept drift. Machine Learn. 32(2).Google Scholar
  • Direct Marketing Association (2005) Statistical Fact Book (Direct Marketing Association, New York).Google Scholar
  • Edwards JB, Orcutt GH (1969) Should estimation prior to aggregation be the rule? Rev. Econ. Stat. 51(November):409–420.CrossrefGoogle Scholar
  • Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J. Comput. Appl. Math. 196(2):425–436.CrossrefGoogle Scholar
  • Erdem T, Keane MP (1996) Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Sci. 15(1):1–20.LinkGoogle Scholar
  • Foekens EW, Leeflang PSH, Wittink DR (1994) A comparison and an exploration of the forecasting accuracy of a loglinear model at different levels of aggregation. Internat. J. Forecast. 10(2):245–261.CrossrefGoogle Scholar
  • Frénay B, Verleysen M (2014) Classification in the presence of label noise: A survey. IEEE Trans. Neural Networks Learn. Systems 25(5):845–869.CrossrefGoogle Scholar
  • Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7(4):697–717.CrossrefGoogle Scholar
  • Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning, Springer Series in Statistics, vol. 1 (Springer, New York).Google Scholar
  • Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput. Surv. 46(4):1–37.CrossrefGoogle Scholar
  • Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J. Machine Learn. Res. 13(March):723–773.Google Scholar
  • Grunfeld Y, Griliches Z (1960) Is aggregation necessarily bad? Rev. Econom. Statist. 42(February):1–13.CrossrefGoogle Scholar
  • Gupta S, Chintagunta P, Kaul A, Wittink DR (1996) Do household scanner data provide representative inferences from brand choices: A comparison with store data. J. Marketing Res. 33(4):383–398.CrossrefGoogle Scholar
  • Hall P, Tajvidi N (2002) Permutation tests for equality of distributions in high‐dimensional settings. Biometrika 89(2):359–374.CrossrefGoogle Scholar
  • Hand DJ (2006a) Classifier technology and the illusion of progress. Statist. Sci. 21(1):1–14.CrossrefGoogle Scholar
  • Hand DJ (2006b) Rejoinder: Classifier technology and the illusion of progress. Statist. Sci. 21(1):30–34.CrossrefGoogle Scholar
  • Harries MB, Sammut C, Horn K (1998) Extracting hidden context. Machine Learn. 32(2):101–126.CrossrefGoogle Scholar
  • Harries M, Horn K (1995) Detecting concept drift in financial time series prediction using symbolic machine learning. AI-Conference (World Scientific Publishing, Singapore), 91–98.Google Scholar
  • He H, Ma Y, eds. (2013) Imbalanced Learning: Foundations, Algorithms, and Applications (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Helmbold DP, Long PM (1991) Tracking drifting concepts using random examples. Warmuth MK, Valiant LG, eds. Proc. 4th Annual Workshop Comput. Learn. Theory (Morgan Kaufmann Publishers Inc., Burlington, MA), 13–23.CrossrefGoogle Scholar
  • Helmbold DP, Long PM (1994) Tracking drifting concepts by minimizing disagreements. Machine Learn. 14(1):27–45.CrossrefGoogle Scholar
  • Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.CrossrefGoogle Scholar
  • Hitsch G, Misra S (2018) Heterogeneous treatment effects and optimal targeting policy. Working paper, University of Chicago, Chicago.Google Scholar
  • Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 604–613.CrossrefGoogle Scholar
  • Jain A, Zongker D (1997) Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Machine Intelligence 19(2):153–158.CrossrefGoogle Scholar
  • Kim G, Chae BK, Olson DL (2013) A support vector machine (SVM) approach to imbalanced datasets of customer responses: Comparison with other customer response models. Serv. Bus. 7(1):167–182.CrossrefGoogle Scholar
  • Klinkenberg R (2003) Predicting phases in business cycles under concept drift. Proc. LWA, 3–10.Google Scholar
  • Krishna A (1992) The normative impact of consumer price expectations for multiple brands on consumer purchase behavior. Marketing Sci. 11(3):266–286.LinkGoogle Scholar
  • Krishna A (1994) The impact of dealing patterns on purchase behavior. Marketing Sci. 13(4):351–373.LinkGoogle Scholar
  • Kuh A, Petsche T, Rivest RL (1992) Incrementally learning time-varying half-planes. Moody JE, Hanson SJ, Lippmann, RP, eds. Advances in Neural Information Processing Systems, vol. 4 (MIT Press, Cambridge, MA), 920–927.Google Scholar
  • Kuh A, Petsche T, Rivest RL (1991) Learning time-varying concepts. Lippmann RP, Moody JE, Touretzky DS, eds. Advances in Neural Information Processing Systems, vol. 3 (Curran Associates, Red Hook, NY), 183–189.Google Scholar
  • Kukar M (2003) Drifting concepts as hidden factors in clinical studies. Dojat M, Keravnou ET, Barahona P, eds. Conf. Artificial Intelligence Medicine Europe (Springer, Berlin), 355–364.CrossrefGoogle Scholar
  • Lane T, Brodley CE (1998) Agrawal R, Stolorz P, eds. Approaches to online learning and concept drift for user identification in computer security. KDD’98 Proc. 4th Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 259–263.Google Scholar
  • Li Y, Swersky K, Zemel R (2015) Generative moment matching networks. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 1718–1727.Google Scholar
  • McCarty JA, Hastak M (2007) Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. J. Bus. Res. 60(6):656–662.CrossrefGoogle Scholar
  • Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognition 45(1):521–530.CrossrefGoogle Scholar
  • Movellan JR, Mineiro P (1998) Robust sensor fusion: Analysis and application to audio visual speech recognition. Machine Learn. 32(2):85–100.CrossrefGoogle Scholar
  • Naik PA, Mantrala MK, Sawyer AG (1998) Planning media schedules in the presence of dynamic advertising quality. Marketing Sci. 17(3):214–235.LinkGoogle Scholar
  • Olson DL, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Service Bus. 3(2):117–130.CrossrefGoogle Scholar
  • Pechenizkiy M, Bakker J, Žliobaitė I, Ivannikov A, Kärkkäinen T (2010) Online mass flow prediction in CFB boilers with explicit detection of sudden concept drift. SIGKDD Explorations 11(2):109–116.CrossrefGoogle Scholar
  • Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. Luzar-Stiffler V, Dobric VH, eds. 28th Internat. Conf. Inform. Tech. Interfaces (IEEE, Piscataway, NJ), 213–218.CrossrefGoogle Scholar
  • Pesaran MH, Pierse RG, Kumar MS (1989) Econometric analysis of aggregation in the context of linear prediction models. Econometrica 57(4):861–888.CrossrefGoogle Scholar
  • Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Machine Learn. 1(3):317–354.CrossrefGoogle Scholar
  • Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41(5):2263–2291.CrossrefGoogle Scholar
  • Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Planning Inference 90(2):227–244.CrossrefGoogle Scholar
  • Simester D, Timoshenko A, Zoumpoulis SI (2019) Efficiently evaluating targeting policies: Improving upon champion vs. challenger experiments. Management Sci. Forthcoming.Google Scholar
  • Smirnov NV (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Math. Univ. Moscou 2(2):3–14.Google Scholar
  • Stock JH, Watson MW (2005) An empirical comparison of methods for forecasting using many predictors. Manuscript, Princeton University, Princeton, NJ.Google Scholar
  • Sugiyama M, Kawanabe M (2012) Machine Learning in Non-Stationary Environments (MIT Press, Cambridge, MA).CrossrefGoogle Scholar
  • Sugiyama M, Krauledat M, Müller K-R (2007) Covariate shift adaptation by importance weighted cross validation. J. Machine Learn. Res. 8(May):985–1005.Google Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction, vol. 1, no. 1 (MIT Press, Cambridge, MA).Google Scholar
  • Tapak L, Mahjub H, Hamidi O, Poorolajal J (2013) Real-data comparison of data mining methods in prediction of diabetes in Iran. Healthcare Inform. Res. 19(3):177–185.CrossrefGoogle Scholar
  • Thompson PA, Noordewier T (1992) Estimating the effects of consumer incentive programs on domestic automobile sales. J. Bus. Econom. Statist. 10(4):409–417.CrossrefGoogle Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. B. 58(1):267–288.CrossrefGoogle Scholar
  • Tong L, Erdmann C, Daldalian M, Li J, Esposito T (2016) Comparison of predictive modeling approaches for 30-day all-cause non-elective readmission risk. BMC Medical Res. Methodology 16(1):26.CrossrefGoogle Scholar
  • Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2006) Handling local concept drift with dynamic integration of classifiers: Domain of antibiotic resistance in nosocomial infections. Lee DJ, Nutter B, Antani S, Mitra S, Archibald J, eds. 19th IEEE Internat. Sympos. Comput.-Based Medical Systems (CBMS 2006) (IEEE, Piscataway, NJ), 679–684.CrossrefGoogle Scholar
  • Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empirical Software Engrg. 17(1–2):62–74.CrossrefGoogle Scholar
  • Ueki K, Sugiyama M, Ihara Y (2010) Perceived age estimation under lighting condition change by covariate shift adaption. Proc. 22nd Internat. Conf. Comput. Linguistics (IEEE, Piscataway, NJ), 897–904.Google Scholar
  • Vicente R, Kinouchi O, Caticha N (1998) Statistical mechanics of online learning of drifting concepts: A variational approach. Machine Learn. 32(2):179–201.CrossrefGoogle Scholar
  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. Proc. 9th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 226–235.CrossrefGoogle Scholar
  • Widmer G, Kubat M (1998) Guest editors’ introduction. Machine Learn. 32:83–84.CrossrefGoogle Scholar
  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Machine Learn. 23(1):69–101.CrossrefGoogle Scholar
  • Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM (2002) Brain–computer interfaces for communication and control. Clinical Neurophysiology 113(6):767–791.CrossrefGoogle Scholar
  • Wu J, Vadera S, Dayson K, Burridge D, Clough I (2010) A comparison of data mining methods in microfinance. 2nd IEEE Internat. Conf. Inform. Financial Engrg. (ICIFE) (IEEE, Piscataway, NJ), 499–502.CrossrefGoogle Scholar
  • Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition 41(12):3600–3612.CrossrefGoogle Scholar
  • Yamazaki K, Kawanabe M, Watanabe S, Sugiyama M, Müller K-R (2007) Asymptotic Bayesian generalization error when training and test distributions are different. Ghahramani Z, ed. Proc. 24th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 1079–1086.CrossrefGoogle Scholar
  • Yang L, Jin R (2006) Distance metric learning: A comprehensive survey. Manuscript, Michigan State University, East Lansing.Google Scholar
  • Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. Proc. 21st Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 114.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.