Performance Comparison of Machine Learning Platforms

Asim Roy
Asim Roy
http://orcid.org/0000-0002-8530-0242
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Shiban Qureshi
Shiban Qureshi
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Kartikeya Pande
Kartikeya Pande
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Divitha Nair
Divitha Nair
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Kartik Gairola
Kartik Gairola
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Pooja Jain
Pooja Jain
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Suraj Singh
Suraj Singh
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Kirti Sharma
Kirti Sharma
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Akshay Jagadale
Akshay Jagadale
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Yi-Yang Lin
Yi-Yang Lin
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Shashank Sharma
Shashank Sharma
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Ramya Gotety
Ramya Gotety
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Yuexin Zhang
Yuexin Zhang
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Ji Tang
Ji Tang
Search for more papers by this author
,
Tejas Mehta
Tejas Mehta
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Hemanth Sindhanuru
Hemanth Sindhanuru
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Nonso Okafor
Nonso Okafor
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Santak Das
Santak Das
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Chidambara N. Gopal
Chidambara N. Gopal
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Srinivasa B. Rudraraju
Srinivasa B. Rudraraju
Department of Information Systems, Arizona State University, Tempe, Arizona 85287;
Search for more papers by this author
,
Avinash V. Kakarlapudi
Avinash V. Kakarlapudi
Department of Information Systems, Arizona State University, Tempe, Arizona 85287
Search for more papers by this author

Asim Roy

http://orcid.org/0000-0002-8530-0242

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Shiban Qureshi

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Kartikeya Pande

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Divitha Nair

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Kartik Gairola

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Pooja Jain

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Suraj Singh

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Kirti Sharma

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Akshay Jagadale

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Yi-Yang Lin

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Shashank Sharma

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Ramya Gotety

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Yuexin Zhang

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Ji Tang

Search for more papers by this author

Tejas Mehta

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Hemanth Sindhanuru

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Nonso Okafor

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Santak Das

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Chidambara N. Gopal

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Srinivasa B. Rudraraju

Department of Information Systems, Arizona State University, Tempe, Arizona 85287;

Search for more papers by this author

Avinash V. Kakarlapudi

Department of Information Systems, Arizona State University, Tempe, Arizona 85287

Search for more papers by this author

Published Online:30 Jan 2019https://doi.org/10.1287/ijoc.2018.0825

References

Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. Proc. Fifth Turkish Artificial Intelligence Artificial Neural Networks Symposium (TAINN 96) (TÜBITAK, Ankara, Turkey).Google Scholar
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, et al.. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511.Crossref, Google Scholar
Alpaydm E (1999) Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Comput. 11(8):1885–1892.Crossref, Google Scholar
Bhatt RB, Sharma G, Dhall A, Chaudhury S (2009). Efficient skin region segmentation using low complexity fuzzy decision tree model. 2009 Annual IEEE India Conference (Curran Associates, Red Hook, NY), 1–4.Crossref, Google Scholar
Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. Electronics Agriculture. 24(3):131–151.Crossref, Google Scholar
Blake CL, Merz CJ (1998) UCI Repository of Machine Learning Databases. (University of California, Irvine, School of Information and Computer Science, Irvine, CA).Google Scholar
Bock RK, Chilingarian A, Gaug M, Hakl F, Hengstebeck T, Jiřina M, Klaschka J, et al. (2004) Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope. Nuclear Instruments Methods Phys. Res. Sect. A: Accelerators, Spectrometers, Detectors Associated Equipment 516(2–3):511–528.Crossref, Google Scholar
Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
Bridge JP, Holden SB, Paulson LC (2014) Machine learning for first-order theorem proving. J. Automated Reasoning 53(2):141–172.Crossref, Google Scholar
Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems Appl. 39(3):3446–3453.Crossref, Google Scholar
Candanedo LM, Feldheim V (2016) Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Buildings 112:28–39.Crossref, Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learning (ACM, New York), 161–168.Crossref, Google Scholar
Cattral R, Oppacher F, Deugo D (2002) Evolutionary data mining with automatic rule generalization. Recent Advances in Computers. Comput. Comm. 1(1):296–300.Google Scholar
Corder GW, Foreman DI (2014) Nonparametric Statistics: A Step-by-Step Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
Cortes C, Vapnik V (1995) Support vector machine. Machine Learn. 20(3):273–297.Crossref, Google Scholar
Cox DR, Snell EJ (1989) Analysis of Binary Data, vol. 32 (CRC Press, Boca Raton, FL).Google Scholar
Davenport TH, Patil DJ (2012) Data scientist. Harvard Bus. Rev. 90:70–76.Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7:1–30.Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7):1895–1923.Crossref, Google Scholar
Dietterich TG, Jain AN, Lathrop RH, Lozano-Perez T (1994) A comparison of dynamic reposing and tangent distance for drug activity prediction. Adv. Neural Inform. Processing Systems. 6:216–223.Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97(457):77–87.Crossref, Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J. Amer. Statist. Assoc. 56:52–64.Crossref, Google Scholar
Eugster MJ, Hothorn T, Leisch F (2016) Domain-based benchmark experiments: Exploratory and inferential analysis. Austrian J. Statist. 41(1):5–26.Crossref, Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J. Machine Learn. Res. 15(1):3133–3181.Google Scholar
Fisher RA (1959) Statistical Methods and Scientific Inference, 2nd ed. (Hafner Publishing Co., New York).Google Scholar
Frey PW, Slate DJ (1991) Letter recognition using Holland-style adaptive classifiers. Machine Learn. 6(2):161–182.Crossref, Google Scholar
Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.Crossref, Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32(200):675–701.Crossref, Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Statist. 11(1):86–92.Crossref, Google Scholar
García S, Fernandez A, Benıtez AD, Herrera F (2007) Statistical comparisons by means of non-parametric tests: A case study on genetic based machine learning. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.601.9106.Google Scholar
García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6):617–644.Crossref, Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inform. Sci. 180(10):2044–2064.Crossref, Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, et al.. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286:531–537.Crossref, Google Scholar
Guvenir HA, Acar B, Demiroz G, Cekin A (1997) A supervised machine learning algorithm for arrhythmia analysis. Comput. Cardiol. 1997:433–436.Google Scholar
Higgins JJ (2003) Introduction to modern nonparametric statistics. Accessed May 10, 2016, https://www.amazon.com/Introduction-Modern-Nonparametric-Statistics-Higgins/dp/0534387756.Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intelligence 20(8):832–844.Crossref, Google Scholar
Hollander M, Wolfe DA, Chicken E (2013) Nonparametric Statistical Methods (John Wiley & Sons, Hoboken, NJ).Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian J. Statist. 6(1):65–70.Google Scholar
Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. Proc. Internat. Conf. Intelligent Systems Molecular Biol. 4:109–115.Google Scholar
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied Logistic Regression, vol. 398 (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Hothorn T, Leisch F, Zeileis A, Hornik K (2005) The design and analysis of benchmark experiments. J. Comput. Graphical Statist. 14(3):675–699.Crossref, Google Scholar
IBM SPSS Modeler (2016) Accessed May 10, 2016, https://www-01.ibm.com/software/analytics/spss/products/modeler/.Google Scholar
Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. J. Comput. Graphical Statist. 5(3):299–314.Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Comm. Statist. Theory Methods. 9(6):571–595.Crossref, Google Scholar
Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. De Ruyter B, Wichert R, Keyson DV, Markopoulos P, Streitz N, Divitini M, Georgatas N, Mana Gomez A, eds. Ambient Intelligence (Springer Berlin, Heidelberg), 177–186.Crossref, Google Scholar
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6):673–679.Crossref, Google Scholar
King RD, Feng C, Sutherland A (1995) Statlog: Comparison of classification algorithms on large real-world problems. Appl. Artificial Intelligence Internat. J. 9(3):289–333.Crossref, Google Scholar
Kleinbaum DG, Klein M (2010) Logistic Regression: A Self-Learning Text (Springer Verlag, New York).Google Scholar
Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid. Simoudis E, Han J, Fayyad U, eds. Proc. 2nd Internat. Conf. Knowledge Discovery Data Mining (AAAI Press, Palo Alto, CA), 202–207.Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Software Engrg. 34(4):485–496.Crossref, Google Scholar
Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247(1):124–136.Crossref, Google Scholar
Lewis DD (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. European Conference on Machine Learning (Springer, Berlin, Heidelberg), 4–15.Crossref, Google Scholar
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn. 40(3):203–228.Crossref, Google Scholar
McCormick K, Abbott D, Brown MS, Khabaza T, Mutchler SR (2013) IBM SPSS Modeler Cookbook (Packt Publishing Ltd., Birmingham, UK).Google Scholar
Meng X, Bradley J, Yuvaz B, Sparks E, Venkataraman S, Liu D, et al.. (2016) Mllib: Machine learning in Apache Spark. J. Machine Learn. Res. 17(34):1–7.Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.355.Google Scholar
Microsoft Azure ML (2016) Accessed May 10, 2016, https://azure.microsoft.com/en-us/documentation/services/machine-learning/; https://studio.azureml.net/.Google Scholar
Miller RG Jr (1997) Beyond ANOVA: Basics of Applied Statistics (CRC Press, Boca Raton, FL).Crossref, Google Scholar
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.Crossref, Google Scholar
Mund S (2015) Microsoft Azure Machine Learning (Packt Publishing Ltd., Birmingham, UK), Accessed May 10, 2016, https://www.packtpub.com/big-data-and-business-intelligence/microsoft-azure-machine-learningGoogle Scholar
Nemenyi P (1962) Distribution-free multiple comparisons. Biometrics 18(2):263.Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al.. (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
Pentreath N (2015) Machine Learning with Spark (Packt Publishing Ltd., Birmingham, UK).Google Scholar
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442.Crossref, Google Scholar
Quinlan JR (1986) Induction of decision trees. Machine Learn. 1(1):81–106.Crossref, Google Scholar
Rokach L, Maimon O (2014) Data Mining with Decision Trees: Theory and Applications (World Scientific, Hackensack, NJ).Crossref, Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Nature 323:533–536.Crossref, Google Scholar
Rumelhart DE, Durbin R, Golden R, Chauvin Y (1995) Backpropagation: The basic theory. Chauvin Y, Rumelhart DE, eds. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Hillsdale, NJ), 1–34.Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517.Crossref, Google Scholar
SAS Institute, Inc. (2015) Getting Started with SAS® Enterprise Miner™ 14.1 (SAS Institute, Inc., Cary, NC).Google Scholar
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, et al.. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209.Crossref, Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643.Crossref, Google Scholar
Stoica I, Zaharia M (2016) Introducing Databricks Community Edition: Apache Spark for all. Accessed May 10, 2016, https://databricks.com/blog/2016/02/17/introducing-databricks-community-edition-apache-spark-for-all.html.Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36:111–147.Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process. Lett. 9(3):293–300.Crossref, Google Scholar
Wahono RS, Herman NS, Ahmad S (2014) A comparison framework of classification models for software defect prediction. Adv. Sci. Lett. 20(10–11):1945–1950.Crossref, Google Scholar
Weis M, Rumpf T, Gerhards R, Plümer L (2009) Comparison of different classification algorithms for weed detection from images based on shape parameters. Bornimer Agrartechn. Ber. 69:53–64.Google Scholar
White T (2012) Hadoop: The Definitive Guide (O'Reilly Media, Inc., Sebastopol, CA).Google Scholar
Yeh IC, Lien CH (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36(2):2473–2480.Crossref, Google Scholar