Positive-versus-Negative Classification for Model Aggregation in Predictive Data Mining

Published Online:https://doi.org/10.1287/ijoc.1120.0540

References

  • Ali KM, Pazzani J (1996) Error reduction through learning multiple descriptions. Machine Learn. 24:173–202.CrossrefGoogle Scholar
  • Bay SD, Kibler D, Pazzani MJ, Smyth P (2000) The UCI KDD archive of large data sets for data mining research and experimentation. Assoc. Comput. Machinery Special Interest Group on Knowledge Discovery in Databases 2(2):81–85.Google Scholar
  • Bishop CM (1995) Neural Network for Pattern Recognition (Clarendon Press, Oxford, UK).CrossrefGoogle Scholar
  • Breiman L (1996) Bagging predictors. Machine Learn. 24:123–140.CrossrefGoogle Scholar
  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees (Wadsworth and Brooks, Pacific Grove, CA).Google Scholar
  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13(1):21–27.CrossrefGoogle Scholar
  • Dietterich TG, Kong EB (1995) Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report, Department of Computer Science, Oregon State University, Corvallis, OR. http://web.engr.oregonstate.edu/∼tgd/publications/index.html.Google Scholar
  • Fan C, Muller M, Rezucha I (1962) Development of sampling plans by using sequential (item by item) selection techniques and digital computers. J. Amer. Statist. Assoc. 57(298):387–402.CrossrefGoogle Scholar
  • Fawcett T (2001) Using rulesets to maximise ROC performance. Proc. IEEE Internat. Conf. Data Mining (ICDM-2001) (IEEE Computer Society, San Jose, CA), 131–138.CrossrefGoogle Scholar
  • Fawcett T (2004) ROC graphs: Notes and practical considerations for researchers. Technical report, HP Laboratories, Palo Alto, CA. Accessed November 2012, http://home.comcast.net/∼tom.fawcett/public_html/papers/ROC101.pdf.Google Scholar
  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recognition Lett. 27(8):861–874.CrossrefGoogle Scholar
  • Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.CrossrefGoogle Scholar
  • Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1):55–77.CrossrefGoogle Scholar
  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput. 4:1–58.CrossrefGoogle Scholar
  • Giudici P (2003) Applied Data Mining: Statistical Methods for Business and Industry (John Wiley & Sons, Chichester, UK).Google Scholar
  • Giudici P, Figini S (2009) Applied Data Mining for Business and Industry, 2nd ed. (John Wiley & Sons, Chichester, UK).CrossrefGoogle Scholar
  • Goodman A, Kamath C, Kumar V (2008) Data analysis in the 21st century. Statist. Anal. Data Mining 1(1):1–3.CrossrefGoogle Scholar
  • Hand DJ (1997) Construction and Assessment of Classification Rules (John Wiley & Sons, Chichester, UK).Google Scholar
  • Hand DJ (1998) Data mining: statistics and more? Amer. Statistician 52(2):112–118.CrossrefGoogle Scholar
  • Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learn. 45:171–186.CrossrefGoogle Scholar
  • Hand DJ, Manila H, Smyth P (2001) Principles of Data Mining (MIT Press, Cambridge, MA).Google Scholar
  • Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans. Pattern Anal. Machine Intelligence 12(10):993–1001.CrossrefGoogle Scholar
  • Hettich S, Bay SD (1999) The UCI KDD archive ( http://kdd.ics.uci.edu). Department of Information and Computer Science, University of California, Irvine, CA.Google Scholar
  • Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intelligence 20(8):832–844.CrossrefGoogle Scholar
  • Jones T (1962) A note on sampling from tape files. Comm. ACM 5(6):343.CrossrefGoogle Scholar
  • Kittler J (1998) Combining classifiers: A theoretical framework. Pattern Anal. Appl. 1:18–27.CrossrefGoogle Scholar
  • Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. Saitta L, ed. Machine Learn.: Proc. Thirteenth Internat. Conf. (Morgan Kaufmann, San Francisco), 275–283.Google Scholar
  • Kohavi R, Mason RJ, Zheng Z (2004) Lessons and challenges from mining retail e-commerce data. Machine Learn. 57:83–113.CrossrefGoogle Scholar
  • Kong E, Dietterich T (1995) Error-correcting output coding corrects bias and variance. Proc. Twelfth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 313–321.CrossrefGoogle Scholar
  • Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. Tesauro G, Touretzky DS, Leen TK, eds. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 231–238.Google Scholar
  • Kuncheva L (2004) Combining Pattern Classifiers: Methods and Algorithms (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Kwok SW, Carter C (1990) Multiple decision trees. Shachter RD, Levitt TS, Kanal LN, Lemmer JF, eds. Uncertainty in Artificial Intelligence, Vol. 4 (Elsevier Science Publishers, North-Holland, Amsterdam), 327–335.CrossrefGoogle Scholar
  • Laskov P, Düssel P, Schäfer C, Rieck K (2005) Learning intrusion detection: Supervised or unsupervised? ICAP: Internat. Conf. Image Anal. Processing, Cagliari, Italy.CrossrefGoogle Scholar
  • Lee W, Stolfo J (2000) A framework for constructing features and models for intrusion detection systems. ACM Trans. Inform. System Security 3(4):227–261.CrossrefGoogle Scholar
  • Lutu PEN (2010) Data set selection for aggregate model implementation in predictive data mining. Ph.D. thesis, Department of Computer Science, University of Pretoria, South Africa.Google Scholar
  • Lutu PEN (2011a) Using confusion graphs and confusion matrices to design ensemble base models for classification. Cuzzocrea A, Dayal U, eds. Proc. 13th Internat. Conf. Data Warehousing and Knowledge Discovery, DaWak 2011, Toulouse, France (Lecture Notes in Computer Science, Vol. 6862, LNCS, Berlin), 301–315.CrossrefGoogle Scholar
  • Lutu PEN (2011b) Empirical comparison of four classifier fusion strategies for positive-versus-negative ensembles. Sewcheran K, Osman H, eds. Proc. SAICSIT 2011, Cape Town, South Africa (ACM International Conference Series, New York), Vol. 978, 302–305.CrossrefGoogle Scholar
  • Lutu PEN, Engelbrecht AP (2010) A decision rule-based method for feature selection in predictive data mining. Expert Systems Appl. 37(1):602–609.CrossrefGoogle Scholar
  • Lutu PEN, Engelbrecht AP (2012) Using OVA modeling to improve classification performance for large data sets. Expert Systems Appl. 39(4):4358–4376.CrossrefGoogle Scholar
  • Lutu PEN, Engelbrecht AP (2013) Base model combination algorithm for resolving tied predictions for k-nearest neighbor OVA ensemble models. INFORMS J. Comput. 25(3):517–526.LinkGoogle Scholar
  • Olken F (1993) Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley, Berkeley, CA.Google Scholar
  • Olken F, Rotem D (1995) Random sampling from databases—A survey. Statist. Comput. 5(1):25–42.CrossrefGoogle Scholar
  • Ooi CH, Chetty M, Teng SW (2007) Differential prioritization in feature selection and classifier aggregation for multiclass microarray data sets. Data Mining and Knowledge Discovery 14:329–366.CrossrefGoogle Scholar
  • Osei-Bryson K-M, Kah MO, Kah JML (2008) Selecting predictive models for inclusion in an ensemble. The 18th Triennial Conf. Internat. Federation of Oper. Res. Soc. (IFORS 2008), Sandton, Johannesburg.Google Scholar
  • Provost F, Domingos P (2001) Well trained PETS: Improving probability estimation trees. Working Paper IS-00-04, Stern School of Business, New York University, New York.Google Scholar
  • Provost F, Fawcett T (2001) Robust classification for imprecise environments. Machine Learn. 42:203–231.CrossrefGoogle Scholar
  • Quinlan JR (1993) C4.5: Programs for Machine Learning (Morgan Kauffman, San Francisco).Google Scholar
  • Quinlan JR (2004) An informal tutorial, Rulequest research. Accessed October 28, 2005, http://www.rulequest.com.Google Scholar
  • Rao PSRS (2000) Sampling Methodologies with Applications (CRC/Chapman and Hall, Boca Raton, FL).CrossrefGoogle Scholar
  • Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J. Machine Learn. Res. 5:101–141.Google Scholar
  • Shin SW, Lee CH (2006) Using attack-specific feature subsets for network intrusion detection. Proc. 19th Australian Conf. Artificial Intelligence, Hobart, Australia.CrossrefGoogle Scholar
  • Smyth P (2001) Data mining at the interface of computer science and statistics. Grossman RL, Kamath C, Kegelmeyer P, Kumar V, Namburu RR, eds. Data Mining for Scientific and Engineering Applications (Kluwer Academic Publishers, Dordrecht, Netherlands).CrossrefGoogle Scholar
  • Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, Mclachlan GJ, et al. (2007) Top 10 algorithms in data mining. Knowledge Inform. Systems 14(1):1–37.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.