Performance Comparison of Machine Learning Platforms

Published Online:https://doi.org/10.1287/ijoc.2018.0825

References

  • Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. Proc. Fifth Turkish Artificial Intelligence Artificial Neural Networks Symposium (TAINN 96) (TÜBITAK, Ankara, Turkey).Google Scholar
  • Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, et al.. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511.CrossrefGoogle Scholar
  • Alpaydm E (1999) Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Comput. 11(8):1885–1892.CrossrefGoogle Scholar
  • Bhatt RB, Sharma G, Dhall A, Chaudhury S (2009). Efficient skin region segmentation using low complexity fuzzy decision tree model. 2009 Annual IEEE India Conference (Curran Associates, Red Hook, NY), 1–4.CrossrefGoogle Scholar
  • Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. Electronics Agriculture. 24(3):131–151.CrossrefGoogle Scholar
  • Blake CL, Merz CJ (1998) UCI Repository of Machine Learning Databases. (University of California, Irvine, School of Information and Computer Science, Irvine, CA).Google Scholar
  • Bock RK, Chilingarian A, Gaug M, Hakl F, Hengstebeck T, Jiřina M, Klaschka J, et al. (2004) Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope. Nuclear Instruments Methods Phys. Res. Sect. A: Accelerators, Spectrometers, Detectors Associated Equipment 516(2–3):511–528.CrossrefGoogle Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.CrossrefGoogle Scholar
  • Bridge JP, Holden SB, Paulson LC (2014) Machine learning for first-order theorem proving. J. Automated Reasoning 53(2):141–172.CrossrefGoogle Scholar
  • Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems Appl. 39(3):3446–3453.CrossrefGoogle Scholar
  • Candanedo LM, Feldheim V (2016) Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Buildings 112:28–39.CrossrefGoogle Scholar
  • Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learning (ACM, New York), 161–168.CrossrefGoogle Scholar
  • Cattral R, Oppacher F, Deugo D (2002) Evolutionary data mining with automatic rule generalization. Recent Advances in Computers. Comput. Comm. 1(1):296–300.Google Scholar
  • Corder GW, Foreman DI (2014) Nonparametric Statistics: A Step-by-Step Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Cortes C, Vapnik V (1995) Support vector machine. Machine Learn. 20(3):273–297.CrossrefGoogle Scholar
  • Cox DR, Snell EJ (1989) Analysis of Binary Data, vol. 32 (CRC Press, Boca Raton, FL).Google Scholar
  • Davenport TH, Patil DJ (2012) Data scientist. Harvard Bus. Rev. 90:70–76.Google Scholar
  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7:1–30.Google Scholar
  • Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7):1895–1923.CrossrefGoogle Scholar
  • Dietterich TG, Jain AN, Lathrop RH, Lozano-Perez T (1994) A comparison of dynamic reposing and tangent distance for drug activity prediction. Adv. Neural Inform. Processing Systems. 6:216–223.Google Scholar
  • Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97(457):77–87.CrossrefGoogle Scholar
  • Dunn OJ (1961) Multiple comparisons among means. J. Amer. Statist. Assoc. 56:52–64.CrossrefGoogle Scholar
  • Eugster MJ, Hothorn T, Leisch F (2016) Domain-based benchmark experiments: Exploratory and inferential analysis. Austrian J. Statist. 41(1):5–26.CrossrefGoogle Scholar
  • Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J. Machine Learn. Res. 15(1):3133–3181.Google Scholar
  • Fisher RA (1959) Statistical Methods and Scientific Inference, 2nd ed. (Hafner Publishing Co., New York).Google Scholar
  • Frey PW, Slate DJ (1991) Letter recognition using Holland-style adaptive classifiers. Machine Learn. 6(2):161–182.CrossrefGoogle Scholar
  • Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.CrossrefGoogle Scholar
  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32(200):675–701.CrossrefGoogle Scholar
  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Statist. 11(1):86–92.CrossrefGoogle Scholar
  • García S, Fernandez A, Benıtez AD, Herrera F (2007) Statistical comparisons by means of non-parametric tests: A case study on genetic based machine learning. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.601.9106.Google Scholar
  • García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6):617–644.CrossrefGoogle Scholar
  • García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inform. Sci. 180(10):2044–2064.CrossrefGoogle Scholar
  • Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, et al.. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286:531–537.CrossrefGoogle Scholar
  • Guvenir HA, Acar B, Demiroz G, Cekin A (1997) A supervised machine learning algorithm for arrhythmia analysis. Comput. Cardiol. 1997:433–436.Google Scholar
  • Higgins JJ (2003) Introduction to modern nonparametric statistics. Accessed May 10, 2016, https://www.amazon.com/Introduction-Modern-Nonparametric-Statistics-Higgins/dp/0534387756.Google Scholar
  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intelligence 20(8):832–844.CrossrefGoogle Scholar
  • Hollander M, Wolfe DA, Chicken E (2013) Nonparametric Statistical Methods (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian J. Statist. 6(1):65–70.Google Scholar
  • Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. Proc. Internat. Conf. Intelligent Systems Molecular Biol. 4:109–115.Google Scholar
  • Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied Logistic Regression, vol. 398 (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Hothorn T, Leisch F, Zeileis A, Hornik K (2005) The design and analysis of benchmark experiments. J. Comput. Graphical Statist. 14(3):675–699.CrossrefGoogle Scholar
  • IBM SPSS Modeler (2016) Accessed May 10, 2016, https://www-01.ibm.com/software/analytics/spss/products/modeler/.Google Scholar
  • Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. J. Comput. Graphical Statist. 5(3):299–314.Google Scholar
  • Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Comm. Statist. Theory Methods. 9(6):571–595.CrossrefGoogle Scholar
  • Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. De Ruyter B, Wichert R, Keyson DV, Markopoulos P, Streitz N, Divitini M, Georgatas N, Mana Gomez A, eds. Ambient Intelligence (Springer Berlin, Heidelberg), 177–186.CrossrefGoogle Scholar
  • Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6):673–679.CrossrefGoogle Scholar
  • King RD, Feng C, Sutherland A (1995) Statlog: Comparison of classification algorithms on large real-world problems. Appl. Artificial Intelligence Internat. J. 9(3):289–333.CrossrefGoogle Scholar
  • Kleinbaum DG, Klein M (2010) Logistic Regression: A Self-Learning Text (Springer Verlag, New York).Google Scholar
  • Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid. Simoudis E, Han J, Fayyad U, eds. Proc. 2nd Internat. Conf. Knowledge Discovery Data Mining (AAAI Press, Palo Alto, CA), 202–207.Google Scholar
  • Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Software Engrg. 34(4):485–496.CrossrefGoogle Scholar
  • Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247(1):124–136.CrossrefGoogle Scholar
  • Lewis DD (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. European Conference on Machine Learning (Springer, Berlin, Heidelberg), 4–15.CrossrefGoogle Scholar
  • Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn. 40(3):203–228.CrossrefGoogle Scholar
  • McCormick K, Abbott D, Brown MS, Khabaza T, Mutchler SR (2013) IBM SPSS Modeler Cookbook (Packt Publishing Ltd., Birmingham, UK).Google Scholar
  • Meng X, Bradley J, Yuvaz B, Sparks E, Venkataraman S, Liu D, et al.. (2016) Mllib: Machine learning in Apache Spark. J. Machine Learn. Res. 17(34):1–7.Google Scholar
  • Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.355.Google Scholar
  • Microsoft Azure ML (2016) Accessed May 10, 2016, https://azure.microsoft.com/en-us/documentation/services/machine-learning/; https://studio.azureml.net/.Google Scholar
  • Miller RG Jr (1997) Beyond ANOVA: Basics of Applied Statistics (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.CrossrefGoogle Scholar
  • Mund S (2015) Microsoft Azure Machine Learning (Packt Publishing Ltd., Birmingham, UK), Accessed May 10, 2016, https://www.packtpub.com/big-data-and-business-intelligence/microsoft-azure-machine-learningGoogle Scholar
  • Nemenyi P (1962) Distribution-free multiple comparisons. Biometrics 18(2):263.Google Scholar
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al.. (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
  • Pentreath N (2015) Machine Learning with Spark (Packt Publishing Ltd., Birmingham, UK).Google Scholar
  • Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442.CrossrefGoogle Scholar
  • Quinlan JR (1986) Induction of decision trees. Machine Learn. 1(1):81–106.CrossrefGoogle Scholar
  • Rokach L, Maimon O (2014) Data Mining with Decision Trees: Theory and Applications (World Scientific, Hackensack, NJ).CrossrefGoogle Scholar
  • Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Nature 323:533–536.CrossrefGoogle Scholar
  • Rumelhart DE, Durbin R, Golden R, Chauvin Y (1995) Backpropagation: The basic theory. Chauvin Y, Rumelhart DE, eds. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Hillsdale, NJ), 1–34.Google Scholar
  • Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517.CrossrefGoogle Scholar
  • SAS Institute, Inc. (2015) Getting Started with SAS® Enterprise Miner™ 14.1 (SAS Institute, Inc., Cary, NC).Google Scholar
  • Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, et al.. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209.CrossrefGoogle Scholar
  • Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643.CrossrefGoogle Scholar
  • Stoica I, Zaharia M (2016) Introducing Databricks Community Edition: Apache Spark for all. Accessed May 10, 2016, https://databricks.com/blog/2016/02/17/introducing-databricks-community-edition-apache-spark-for-all.html.Google Scholar
  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36:111–147.Google Scholar
  • Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process. Lett. 9(3):293–300.CrossrefGoogle Scholar
  • Wahono RS, Herman NS, Ahmad S (2014) A comparison framework of classification models for software defect prediction. Adv. Sci. Lett. 20(10–11):1945–1950.CrossrefGoogle Scholar
  • Weis M, Rumpf T, Gerhards R, Plümer L (2009) Comparison of different classification algorithms for weed detection from images based on shape parameters. Bornimer Agrartechn. Ber. 69:53–64.Google Scholar
  • White T (2012) Hadoop: The Definitive Guide (O'Reilly Media, Inc., Sebastopol, CA).Google Scholar
  • Yeh IC, Lien CH (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36(2):2473–2480.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.