Cost-Restricted Feature Selection for Data Acquisition

Published Online:https://doi.org/10.1287/mnsc.2022.4551

References

  • Agresti A (2002) Categorical Data Analysis (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Aiken LS, West SG, Reno RR (1991) Multiple Regression: Testing and Interpreting Interactions (Sage, Thousand Oaks, CA).Google Scholar
  • Berndt ER (1991) The Practice of Econometrics (Addison-Wesley, New York).Google Scholar
  • Bhattacharyya S (1999) Direct marketing performance modeling using genetic algorithms. INFORMS J. Comput. 11(3):248–257.LinkGoogle Scholar
  • Bolón-Canedo V, Porto-Díaz I, Sánchez-Maroño N, Alonso-Betanzos A (2014) A framework for cost-based feature selection. Pattern Recognition 47(7):2481–2489.CrossrefGoogle Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.CrossrefGoogle Scholar
  • Bult JR, Wansbeek T (1995) Optimal selection for direct mail. Marketing Sci. 14(4):378–394.LinkGoogle Scholar
  • Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: From big data to big impact. Management Inform. Systems Quart. 36(4):1165–1188.CrossrefGoogle Scholar
  • Davenport TH (2006) Competing on analytics. Harvard Bus. Rev. 84(1):99–107.Google Scholar
  • Deng K, Zheng Y, Bourke C, Scott S, Masciale J (2013) New algorithms for budgeted learning. Machine Learn. 90(1):59–90.CrossrefGoogle Scholar
  • DirectMail.com (2020) Mailing list pricing. Accessed July 10, 2020, https://www.directmail.com/mailinglists/mailing-list-pricing.Google Scholar
  • Federal Trade Commission (2014) Data brokers: A call for transparency and accountability. Accessed July 10, 2020, http://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf.Google Scholar
  • Gaines BR, Kim J, Zhou H (2018) Algorithms for fitting the constrained Lasso. J. Computational Graphical Statist. 27(4):861–871.CrossrefGoogle Scholar
  • Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognition Lett. 31(14):2225–2236.CrossrefGoogle Scholar
  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J. Machine Learn. Res. 3:1157–1182.Google Scholar
  • Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York).CrossrefGoogle Scholar
  • Hong S-H (2013) Measuring the effect of Napster on recorded music sales: Difference-in-differences estimates under compositional changes. J. Appl. Econometrics 28(2):297–324.CrossrefGoogle Scholar
  • Jaccard J, Turrisi R (2003) Interaction Effects in Multiple Regression (Sage, Thousand Oaks, CA).CrossrefGoogle Scholar
  • James GM, Paulson C, Rusmevichientong P (2020) Penalized and constrained regression: An application to high-dimensional website advertising. J. Amer. Statist. Assoc. 115(529):107–122.CrossrefGoogle Scholar
  • Jensen R, Shen Q (2007) Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Systems 15(1):73–89.CrossrefGoogle Scholar
  • Kim YS, Street WN, Russell GJ, Menczer F (2005) Customer targeting: A neural network approach guided by genetic algorithms. Management Sci. 51(2):264–276.LinkGoogle Scholar
  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22.Google Scholar
  • Meier L, Van De Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B Statist. Methodology 70(1):53–71.CrossrefGoogle Scholar
  • Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inform. Sci. 181(22):4928–4942.CrossrefGoogle Scholar
  • Molnar C (2020) Interpretable machine learning. Accessed September 18, 2020, https://christophm.github.io/interpretable-ml-book/simple.html.Google Scholar
  • Moro S, Laureano R, Cortez P (2011) Using data mining for bank direct marketing: An application of the CRISP-DM methodology. Novais P, Machado J, Analide C, Abelha A, eds. Proc. Eur. Simulation Model. Conf. (EUROSIS, Ostend, Belgium), 117–121.Google Scholar
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, et al. (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12(85):2825–2830.Google Scholar
  • Ratanamahatana CA, Gunopulos D (2003) Feature selection for the naive Bayesian classifier using decision trees. Appl. Artificial Intelligence 17(5–6):475–487.CrossrefGoogle Scholar
  • Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5):206–215.CrossrefGoogle Scholar
  • Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Management Sci. 55(4):664–684.LinkGoogle Scholar
  • Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput. Appl. 31:6893–6908.CrossrefGoogle Scholar
  • Steel E (2013) Companies scramble for consumer data. Financial Times Online (June 12), https://www.ft.com/content/f0b6edc0-d342-11e2-b3ff-00144feab7de.Google Scholar
  • Steel E, Locke C, Cadman E, Freese B (2013) How much is your personal data worth? Financial Times Online (June 12), https://ig.ft.com/how-much-is-your-personal-data-worth/.Google Scholar
  • Sugumaran V, Muralidharan V, Ramachandran KI (2007) Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mech. Systems Signal Processing 21(2):930–942.CrossrefGoogle Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. B 58(1):267–288.CrossrefGoogle Scholar
  • Tibshirani R, Taylor J (2011) The solution path of the generalized Lasso. Ann. Statist. 39(3):1335–1371.CrossrefGoogle Scholar
  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused Lasso. J. Roy. Statist. Soc. Ser. B Statist. Methodology 67(1):91–108.CrossrefGoogle Scholar
  • Tillmanns S, Hofstede FT, Krafft M, Goetz O (2017) How to separate the wheat from the chaff: Improved variable selection for new customer acquisition. J. Marketing 81(2):99–113.CrossrefGoogle Scholar
  • Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Lett. 28(4):459–471.CrossrefGoogle Scholar
  • Wedel M, Kannan PK (2016) Marketing analytics for data-rich environments. J. Marketing 80(6):97–121.CrossrefGoogle Scholar
  • Yu G, Witten D, Bien J (2020) Controlling costs: Feature selection on a budget. Preprint, submitted October 8, 2019, https://arxiv.org/abs/1910.03627.Google Scholar
  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B Statist. Methodology 68(1):49–67.CrossrefGoogle Scholar
  • Zhang Y, Gong DW, Cheng J (2017a) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans. Comput. Biol. Bioinformatics 14(1):64–75.CrossrefGoogle Scholar
  • Zhang Y, Song XF, Gong DW (2017b) A return-cost-based binary firefly algorithm for feature selection. Inform. Sci. 418:561–574.CrossrefGoogle Scholar
  • Zhu X, Wu X (2005) Cost-constrained data acquisition for intelligent data preparation. IEEE Trans. Knowledge Data Engrg. 17(11):1542–1556.CrossrefGoogle Scholar
  • Ziarko W (1993) Variable precision rough set model. J. Comput. System Sci. 46(1):39–59.CrossrefGoogle Scholar
  • Zou H (2006) The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.CrossrefGoogle Scholar
  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B Statist. Methodology 67(2):301–320.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.