Performance Comparison of Machine Learning Platforms
Published Online:30 Jan 2019https://doi.org/10.1287/ijoc.2018.0825
References
- (1996) Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. Proc. Fifth Turkish Artificial Intelligence Artificial Neural Networks Symposium (TAINN 96) (TÜBITAK, Ankara, Turkey).Google Scholar
- . (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511.Crossref, Google Scholar
- (1999) Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Comput. 11(8):1885–1892.Crossref, Google Scholar
- (2009). Efficient skin region segmentation using low complexity fuzzy decision tree model. 2009 Annual IEEE India Conference (Curran Associates, Red Hook, NY), 1–4.Crossref, Google Scholar
- (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. Electronics Agriculture. 24(3):131–151.Crossref, Google Scholar
- (1998) UCI Repository of Machine Learning Databases. (University of California, Irvine, School of Information and Computer Science, Irvine, CA).Google Scholar
- (2004) Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope. Nuclear Instruments Methods Phys. Res. Sect. A: Accelerators, Spectrometers, Detectors Associated Equipment 516(2–3):511–528.Crossref, Google Scholar
- (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
- (2014) Machine learning for first-order theorem proving. J. Automated Reasoning 53(2):141–172.Crossref, Google Scholar
- (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems Appl. 39(3):3446–3453.Crossref, Google Scholar
- (2016) Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Buildings 112:28–39.Crossref, Google Scholar
- (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learning (ACM, New York), 161–168.Crossref, Google Scholar
- (2002) Evolutionary data mining with automatic rule generalization. Recent Advances in Computers. Comput. Comm. 1(1):296–300.Google Scholar
- (2014) Nonparametric Statistics: A Step-by-Step Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1995) Support vector machine. Machine Learn. 20(3):273–297.Crossref, Google Scholar
- (1989) Analysis of Binary Data, vol. 32 (CRC Press, Boca Raton, FL).Google Scholar
- (2012) Data scientist. Harvard Bus. Rev. 90:70–76.Google Scholar
- (2006) Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7:1–30.Google Scholar
- (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7):1895–1923.Crossref, Google Scholar
- (1994) A comparison of dynamic reposing and tangent distance for drug activity prediction. Adv. Neural Inform. Processing Systems. 6:216–223.Google Scholar
- (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97(457):77–87.Crossref, Google Scholar
- (1961) Multiple comparisons among means. J. Amer. Statist. Assoc. 56:52–64.Crossref, Google Scholar
- (2016) Domain-based benchmark experiments: Exploratory and inferential analysis. Austrian J. Statist. 41(1):5–26.Crossref, Google Scholar
- (2014) Do we need hundreds of classifiers to solve real world classification problems. J. Machine Learn. Res. 15(1):3133–3181.Google Scholar
- (1959) Statistical Methods and Scientific Inference, 2nd ed. (Hafner Publishing Co., New York).Google Scholar
- (1991) Letter recognition using Holland-style adaptive classifiers. Machine Learn. 6(2):161–182.Crossref, Google Scholar
- (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.Crossref, Google Scholar
- (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32(200):675–701.Crossref, Google Scholar
- (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Statist. 11(1):86–92.Crossref, Google Scholar
- (2007) Statistical comparisons by means of non-parametric tests: A case study on genetic based machine learning. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.601.9106.Google Scholar
- (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6):617–644.Crossref, Google Scholar
- (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inform. Sci. 180(10):2044–2064.Crossref, Google Scholar
- . (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286:531–537.Crossref, Google Scholar
- (1997) A supervised machine learning algorithm for arrhythmia analysis. Comput. Cardiol. 1997:433–436.Google Scholar
- (2003) Introduction to modern nonparametric statistics. Accessed May 10, 2016, https://www.amazon.com/Introduction-Modern-Nonparametric-Statistics-Higgins/dp/0534387756.Google Scholar
- (1998) The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intelligence 20(8):832–844.Crossref, Google Scholar
- (2013) Nonparametric Statistical Methods (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1979) A simple sequentially rejective multiple test procedure. Scandinavian J. Statist. 6(1):65–70.Google Scholar
- (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. Proc. Internat. Conf. Intelligent Systems Molecular Biol. 4:109–115.Google Scholar
- (2013) Applied Logistic Regression, vol. 398 (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2005) The design and analysis of benchmark experiments. J. Comput. Graphical Statist. 14(3):675–699.Crossref, Google Scholar
- IBM SPSS Modeler (2016) Accessed May 10, 2016, https://www-01.ibm.com/software/analytics/spss/products/modeler/.Google Scholar
- (1996) R: A language for data analysis and graphics. J. Comput. Graphical Statist. 5(3):299–314.Google Scholar
- (1980) Approximations of the critical region of the fbietkan statistic. Comm. Statist. Theory Methods. 9(6):571–595.Crossref, Google Scholar
- (2010) An agent-based approach to care in independent living. De Ruyter B, Wichert R, Keyson DV, Markopoulos P, Streitz N, Divitini M, Georgatas N, Mana Gomez A, eds. Ambient Intelligence (Springer Berlin, Heidelberg), 177–186.Crossref, Google Scholar
- (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6):673–679.Crossref, Google Scholar
- (1995) Statlog: Comparison of classification algorithms on large real-world problems. Appl. Artificial Intelligence Internat. J. 9(3):289–333.Crossref, Google Scholar
- (2010) Logistic Regression: A Self-Learning Text (Springer Verlag, New York).Google Scholar
- (1996) Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid. Simoudis E, Han J, Fayyad U, eds. Proc. 2nd Internat. Conf. Knowledge Discovery Data Mining (AAAI Press, Palo Alto, CA), 202–207.Google Scholar
- (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Software Engrg. 34(4):485–496.Crossref, Google Scholar
- (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247(1):124–136.Crossref, Google Scholar
- (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. European Conference on Machine Learning (Springer, Berlin, Heidelberg), 4–15.Crossref, Google Scholar
- (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn. 40(3):203–228.Crossref, Google Scholar
- (2013) IBM SPSS Modeler Cookbook (Packt Publishing Ltd., Birmingham, UK).Google Scholar
- . (2016) Mllib: Machine learning in Apache Spark. J. Machine Learn. Res. 17(34):1–7.Google Scholar
- (1994) Machine learning, neural and statistical classification. Accessed May 10, 2016, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.355.Google Scholar
- (2016) Accessed May 10, 2016, https://azure.microsoft.com/en-us/documentation/services/machine-learning/; https://studio.azureml.net/.Google Scholar
- (1997) Beyond ANOVA: Basics of Applied Statistics (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.Crossref, Google Scholar
- (2015) Microsoft Azure Machine Learning (Packt Publishing Ltd., Birmingham, UK), Accessed May 10, 2016, https://www.packtpub.com/big-data-and-business-intelligence/microsoft-azure-machine-learningGoogle Scholar
- (1962) Distribution-free multiple comparisons. Biometrics 18(2):263.Google Scholar
- . (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
- (2015) Machine Learning with Spark (Packt Publishing Ltd., Birmingham, UK).Google Scholar
- (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442.Crossref, Google Scholar
- (1986) Induction of decision trees. Machine Learn. 1(1):81–106.Crossref, Google Scholar
- (2014) Data Mining with Decision Trees: Theory and Applications (World Scientific, Hackensack, NJ).Crossref, Google Scholar
- (1988) Learning representations by back-propagating errors. Nature 323:533–536.Crossref, Google Scholar
- (1995) Backpropagation: The basic theory. Chauvin Y, Rumelhart DE, eds. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Hillsdale, NJ), 1–34.Google Scholar
- (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517.Crossref, Google Scholar
- SAS Institute, Inc. (2015) Getting Started with SAS® Enterprise Miner™ 14.1 (SAS Institute, Inc., Cary, NC).Google Scholar
- . (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209.Crossref, Google Scholar
- (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643.Crossref, Google Scholar
- (2016) Introducing Databricks Community Edition: Apache Spark for all. Accessed May 10, 2016, https://databricks.com/blog/2016/02/17/introducing-databricks-community-edition-apache-spark-for-all.html.Google Scholar
- (1974) Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36:111–147.Google Scholar
- (1999) Least squares support vector machine classifiers. Neural Process. Lett. 9(3):293–300.Crossref, Google Scholar
- (2014) A comparison framework of classification models for software defect prediction. Adv. Sci. Lett. 20(10–11):1945–1950.Crossref, Google Scholar
- (2009) Comparison of different classification algorithms for weed detection from images based on shape parameters. Bornimer Agrartechn. Ber. 69:53–64.Google Scholar
- (2012) Hadoop: The Definitive Guide (O'Reilly Media, Inc., Sebastopol, CA).Google Scholar
- (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36(2):2473–2480.Crossref, Google Scholar

