Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications

Published Online:https://doi.org/10.1287/ijoc.2022.1224

References

  • Bagirov A, Clausen C, Kohler M (2010) An l2 boosting algorithm for estimation of a regression function. IEEE Trans. Inform. Theory 56(3):1417–1429.CrossrefGoogle Scholar
  • Barron A, Cohen A, Dahmen W, Devore R (2008) Approximation and learning by greedy algorithms. Ann. Statist. 36(1):64–94.CrossrefGoogle Scholar
  • Bartlett P, Traskin M (2007) Adaboost is consistent. J. Machine Learn. Res. 8:2347–2368.Google Scholar
  • Bickel P, Ritov Y, Zakai A (2006) Some theory for generalized boosting algorithms. J. Machine Learn. Res. 7:705–732.Google Scholar
  • Blanchard G, Krämer N (2016) Convergence rates for kernel conjugate gradient for random design regression. Anal. Appl. (Singapore) 14(6):763–794.CrossrefGoogle Scholar
  • Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.CrossrefGoogle Scholar
  • Buhlmann P, Hothorn T (2007) Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22(4):477–505.Google Scholar
  • Caponnetto A, De Vito E (2007) Optimal rates for the regularized least squares algorithm. Foundations Comput. Math. 7(3):331–368.CrossrefGoogle Scholar
  • Chang N, Sheng O (2008) Decision-tree-based knowledge discovery: Single-vs. multi-decision-tree induction. INFORMS J. Comput. 20(1):46–54.LinkGoogle Scholar
  • Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association of Computing Machinery, New York), 785–794.Google Scholar
  • Chen D, Wu Q, Ying Y, Zhou D (2004) Support vector machine soft margin classifiers: Error analysis. J. Machine Learn. Res. 5:1143–1175.Google Scholar
  • Cranor L, LaMacchia B (1998) Spam! Comm. ACM 41(8):74–83.CrossrefGoogle Scholar
  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7(1):1–30.Google Scholar
  • Deodhar M, Ghosh J, Saar-Tsechansky M, Keshari V (2017) Active learning with multiple localized regression models. INFORMS J. Comput. 29(3):503–522.LinkGoogle Scholar
  • Dua D Graff C (2019) UCI Machine Learning Repository (School of Information and Computer Science, University of California, Irvine, CA). https://archive.ics.uci.edu/ml/datasets/Spambase.Google Scholar
  • Ehrlinger J, Ishwaran H (2012) Characterizing l2 boosting. Ann. Statist. 40(2):1074–1101.CrossrefGoogle Scholar
  • Esuli A, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. Proc. 5th Internat. Conf. on Language Resources and Evaluation, vol. 6, 417–422.Google Scholar
  • Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv. Comput. Math. 13(1):1–50.CrossrefGoogle Scholar
  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inform. Comput. 121(2):256–285.CrossrefGoogle Scholar
  • Freund R, Grigas P (2016) New analysis and results for the frank-wolfe method. Math. Programming 155(1–2):199–230.CrossrefGoogle Scholar
  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.CrossrefGoogle Scholar
  • Friedman J (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.CrossrefGoogle Scholar
  • Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting - rejoinder. Ann. Statist. 28(2):400–407.CrossrefGoogle Scholar
  • Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1):17–40.CrossrefGoogle Scholar
  • Grushka-Cockayne Y, Jose V, Lichtendahl K (2017) Ensembles of overfit and overconfident forecasts. Management Sci. 63(4):1110–1130.LinkGoogle Scholar
  • Györfy L, Kohler M, Krzyżak A, Walk H (2002) A Distribution-Free Theory of Nonparametric Regression (Springer, New York).CrossrefGoogle Scholar
  • Hajj N, Rizk Y, Awad M (2019) A subjectivity classification framework for sports articles using improved cortical algorithms. Neural Comput. Appl. 31(11):8069–8085.CrossrefGoogle Scholar
  • Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554.CrossrefGoogle Scholar
  • Jaggi M (2013) Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. on Machine Learn., vol. 28.Google Scholar
  • Jiang W (2004) Process consistency for adaboost. Ann. Statist. 32(1):13–29.CrossrefGoogle Scholar
  • Ke G, Meng Q, Finely T, Wang T, Chen W, Ma W, Ye Q, et al. (2017) Lightgbm: A highly efficient gradient boosting decision tree. Proc. 31st Internat. Conf. on Neural Information Processing Systems, 3149–3157.Google Scholar
  • Kraus M, Feuerriegel S, Oztekin A (2020) Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res. 281(3):628–641.CrossrefGoogle Scholar
  • Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Comm. ACM 60(6):84–90.CrossrefGoogle Scholar
  • Lin SB, Zhou DX (2018a) Distributed kernel-based gradient descent algorithms. Constructive Approximations 47(2):249–276.CrossrefGoogle Scholar
  • Lin SB, Zhou DX (2018b) Optimal learning rates for kernel partial least squares. J. Fourier Anal. Appl. 24(3):908–933.CrossrefGoogle Scholar
  • Lin SB, Guo X, Zhou DX (2017a) Distributed learning with regularized least squares. J. Machine Learn. Res. 18:1–31.Google Scholar
  • Lin SB, Zeng J, Chang X (2017b) Learning rates for classification with gaussian kernels. Neural Comput. 29(12):3353C–3380.CrossrefGoogle Scholar
  • Lin S, Rong Y, Sun X, Xu Z (2013) Optimization of tree ensembles. IEEE Trans. Neural Network Learn. Systems 24(10):1598–1608.Google Scholar
  • Misić V (2020) Optimization of tree ensembles. Oper. Res. 68(5):1605–1624.LinkGoogle Scholar
  • Mukherjee I, Rudin C, Schapire R (2013) The rate of convergence of adaboost. J. Machine Learn. Res. 14:2315–2347.Google Scholar
  • Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.CrossrefGoogle Scholar
  • Park Y (2021) Optimization for l1-norm error fitting via data aggregation. INFORMS J. Comput. 33(1):120–142.LinkGoogle Scholar
  • Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A(2018) Catboost: Unbiased boosting with categorical features. Proc. 32nd Internat. Conf. on Neural Inform. Processing Systems, 6639–6649.Google Scholar
  • Quinlan J (1987) Simplifying decision trees. Internat. J. Human Comput. Stud. 27(3):221–234.Google Scholar
  • Schapire R (2001) The boosting approach to machine learning: An overview. Proc. Workshop on Nonlinear Estimation and Classification.Google Scholar
  • Schapire R, Freund Y (2012) Boosting: Foundations and Algorithms (MIT Press, Cambridge, MA).CrossrefGoogle Scholar
  • Shi L, Feng YL, Zhou DX (2011) Concentration estimates for learning with l1-regularizer and data dependent hypothesis spaces. Appl. Comput. Harmon. Anal. 31(2):286–302.CrossrefGoogle Scholar
  • Shi L, Huang X, Feng Y, Suykens J (2019) Sparse kernel regression with coefficient-based ℓq-regularization. J. Machine Learn. Res. 20(161):1–44.Google Scholar
  • Silver D, Huang A, Maddison C, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.CrossrefGoogle Scholar
  • Steinwart I, Christmann A (2008) Support Vector Machines (Springer, New York).CrossrefGoogle Scholar
  • Steinwart I, Scovel C (2007) Fast rates for support vector machines using gaussian kernels. Ann. Statist. 35(2):575–607.CrossrefGoogle Scholar
  • Steinwart I, Hush D, Scovel C (2009) Optimal rates for regularized least squares regression. Proc. 22nd Conf. on Learn. Theory.Google Scholar
  • Telgarsky M (2013) Boosting with the logistic loss is consistent, arXiv preprint arXiv:1305.2648.Google Scholar
  • Temlyakov V (2008) Greedy approximation. Acta Numerics 17:235–409.CrossrefGoogle Scholar
  • Temlyakov V (2015) Greedy approximation in convex optimization. Constructive Approximations 41(2):269–296.CrossrefGoogle Scholar
  • Terekhov D, Beck J, Brown K (2009) A constraint programming approach for solving a queueing design and control problem. INFORMS J. Comput. 21(4):549–561.LinkGoogle Scholar
  • Tsybakov A (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.CrossrefGoogle Scholar
  • Vapnik V (2013) The Nature of Statistical Learning Theory. (Springer Science & Business Media, New York)Google Scholar
  • Wang Y, Guo X, Lin SB (2020) Kernel-based l2-boosting with structure constraints. Preprint, submitted September 16, https://arxiv.org/abs/2009.07558.Google Scholar
  • Wang Y, Liao X, Lin S (2019) Rescaled boosting in classification. IEEE Trans. Neural Network Learn. Systems 30(9):2598–2610.CrossrefGoogle Scholar
  • Wei Y, Yang F, Wainwright M (2019) Early stopping for kernel boosting algorithms: A general analysis with localized complexities. IEEE Trans. Inform. Theory 65(10):6685–6703.CrossrefGoogle Scholar
  • Xiang D (2011) Logistic classification with varying Gaussians. Comput. Math. Appl. 61(2):397–407.CrossrefGoogle Scholar
  • Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constructive Approximations 26(2):289–315.CrossrefGoogle Scholar
  • Ye Y (1997) Interior Point Algorithms: Theory and Analysis (Wiley-Interscience, New York).CrossrefGoogle Scholar
  • Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput. Intelligence Magazine 13(3):55–75.CrossrefGoogle Scholar
  • Yu Z, Wang Z, You J, Zhang J, Liu J, Wong H, Han G (2017) A new kind of nonparametric test for statistical comparison of multiple classifiers over multiple data sets. IEEE Trans. Cybernetics 47(12):4418–4431.CrossrefGoogle Scholar
  • Zhang T (2003) Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inform. Theory 49(3):682–691.CrossrefGoogle Scholar
  • Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1):56–85.CrossrefGoogle Scholar
  • Zhang T, Yu B (2005) Boosting with early stopping: Convergence and consistency. Ann. Statist. 33(4):1538–1579.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.