Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications

Shao-Bo Lin
Shao-Bo Lin
[email protected]
https://orcid.org/0000-0001-5122-9153
Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;
Search for more papers by this author
,
Shaojie Tang
Shaojie Tang
[email protected]
https://orcid.org/0000-0001-9261-5210
Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75083
Search for more papers by this author
,
Yao Wang
Yao Wang
[email protected]
https://orcid.org/0000-0003-4207-5273
Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;
Search for more papers by this author
,
Di Wang
Corresponding Author
Di Wang
[email protected]
https://orcid.org/0000-0003-0435-0609
Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;
Search for more papers by this author

Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;

Search for more papers by this author

Shaojie Tang

[email protected]

https://orcid.org/0000-0001-9261-5210

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75083

Search for more papers by this author

Yao Wang

[email protected]

https://orcid.org/0000-0003-4207-5273

Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;

Search for more papers by this author

Di Wang

Corresponding Author

Di Wang

[email protected]

https://orcid.org/0000-0003-0435-0609

Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, Shanxi 710049, China;

Search for more papers by this author

Published Online:19 Aug 2022https://doi.org/10.1287/ijoc.2022.1224

References

Bagirov A, Clausen C, Kohler M (2010) An l2 boosting algorithm for estimation of a regression function. IEEE Trans. Inform. Theory 56(3):1417–1429.Crossref, Google Scholar
Barron A, Cohen A, Dahmen W, Devore R (2008) Approximation and learning by greedy algorithms. Ann. Statist. 36(1):64–94.Crossref, Google Scholar
Bartlett P, Traskin M (2007) Adaboost is consistent. J. Machine Learn. Res. 8:2347–2368.Google Scholar
Bickel P, Ritov Y, Zakai A (2006) Some theory for generalized boosting algorithms. J. Machine Learn. Res. 7:705–732.Google Scholar
Blanchard G, Krämer N (2016) Convergence rates for kernel conjugate gradient for random design regression. Anal. Appl. (Singapore) 14(6):763–794.Crossref, Google Scholar
Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.Crossref, Google Scholar
Buhlmann P, Hothorn T (2007) Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22(4):477–505.Google Scholar
Caponnetto A, De Vito E (2007) Optimal rates for the regularized least squares algorithm. Foundations Comput. Math. 7(3):331–368.Crossref, Google Scholar
Chang N, Sheng O (2008) Decision-tree-based knowledge discovery: Single-vs. multi-decision-tree induction. INFORMS J. Comput. 20(1):46–54.Link, Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association of Computing Machinery, New York), 785–794.Google Scholar
Chen D, Wu Q, Ying Y, Zhou D (2004) Support vector machine soft margin classifiers: Error analysis. J. Machine Learn. Res. 5:1143–1175.Google Scholar
Cranor L, LaMacchia B (1998) Spam! Comm. ACM 41(8):74–83.Crossref, Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7(1):1–30.Google Scholar
Deodhar M, Ghosh J, Saar-Tsechansky M, Keshari V (2017) Active learning with multiple localized regression models. INFORMS J. Comput. 29(3):503–522.Link, Google Scholar
Dua D Graff C (2019) UCI Machine Learning Repository (School of Information and Computer Science, University of California, Irvine, CA). https://archive.ics.uci.edu/ml/datasets/Spambase.Google Scholar
Ehrlinger J, Ishwaran H (2012) Characterizing l2 boosting. Ann. Statist. 40(2):1074–1101.Crossref, Google Scholar
Esuli A, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. Proc. 5th Internat. Conf. on Language Resources and Evaluation, vol. 6, 417–422.Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv. Comput. Math. 13(1):1–50.Crossref, Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority. Inform. Comput. 121(2):256–285.Crossref, Google Scholar
Freund R, Grigas P (2016) New analysis and results for the frank-wolfe method. Math. Programming 155(1–2):199–230.Crossref, Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Crossref, Google Scholar
Friedman J (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.Crossref, Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting - rejoinder. Ann. Statist. 28(2):400–407.Crossref, Google Scholar
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1):17–40.Crossref, Google Scholar
Grushka-Cockayne Y, Jose V, Lichtendahl K (2017) Ensembles of overfit and overconfident forecasts. Management Sci. 63(4):1110–1130.Link, Google Scholar
Györfy L, Kohler M, Krzyżak A, Walk H (2002) A Distribution-Free Theory of Nonparametric Regression (Springer, New York).Crossref, Google Scholar
Hajj N, Rizk Y, Awad M (2019) A subjectivity classification framework for sports articles using improved cortical algorithms. Neural Comput. Appl. 31(11):8069–8085.Crossref, Google Scholar
Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554.Crossref, Google Scholar
Jaggi M (2013) Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. on Machine Learn., vol. 28.Google Scholar
Jiang W (2004) Process consistency for adaboost. Ann. Statist. 32(1):13–29.Crossref, Google Scholar
Ke G, Meng Q, Finely T, Wang T, Chen W, Ma W, Ye Q, et al. (2017) Lightgbm: A highly efficient gradient boosting decision tree. Proc. 31st Internat. Conf. on Neural Information Processing Systems, 3149–3157.Google Scholar
Kraus M, Feuerriegel S, Oztekin A (2020) Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res. 281(3):628–641.Crossref, Google Scholar
Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Comm. ACM 60(6):84–90.Crossref, Google Scholar
Lin SB, Zhou DX (2018a) Distributed kernel-based gradient descent algorithms. Constructive Approximations 47(2):249–276.Crossref, Google Scholar
Lin SB, Zhou DX (2018b) Optimal learning rates for kernel partial least squares. J. Fourier Anal. Appl. 24(3):908–933.Crossref, Google Scholar
Lin SB, Guo X, Zhou DX (2017a) Distributed learning with regularized least squares. J. Machine Learn. Res. 18:1–31.Google Scholar
Lin SB, Zeng J, Chang X (2017b) Learning rates for classification with gaussian kernels. Neural Comput. 29(12):3353C–3380.Crossref, Google Scholar
Lin S, Rong Y, Sun X, Xu Z (2013) Optimization of tree ensembles. IEEE Trans. Neural Network Learn. Systems 24(10):1598–1608.Google Scholar
Misić V (2020) Optimization of tree ensembles. Oper. Res. 68(5):1605–1624.Link, Google Scholar
Mukherjee I, Rudin C, Schapire R (2013) The rate of convergence of adaboost. J. Machine Learn. Res. 14:2315–2347.Google Scholar
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
Park Y (2021) Optimization for l1-norm error fitting via data aggregation. INFORMS J. Comput. 33(1):120–142.Link, Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A(2018) Catboost: Unbiased boosting with categorical features. Proc. 32nd Internat. Conf. on Neural Inform. Processing Systems, 6639–6649.Google Scholar
Quinlan J (1987) Simplifying decision trees. Internat. J. Human Comput. Stud. 27(3):221–234.Google Scholar
Schapire R (2001) The boosting approach to machine learning: An overview. Proc. Workshop on Nonlinear Estimation and Classification.Google Scholar
Schapire R, Freund Y (2012) Boosting: Foundations and Algorithms (MIT Press, Cambridge, MA).Crossref, Google Scholar
Shi L, Feng YL, Zhou DX (2011) Concentration estimates for learning with l1-regularizer and data dependent hypothesis spaces. Appl. Comput. Harmon. Anal. 31(2):286–302.Crossref, Google Scholar
Shi L, Huang X, Feng Y, Suykens J (2019) Sparse kernel regression with coefficient-based ℓq-regularization. J. Machine Learn. Res. 20(161):1–44.Google Scholar
Silver D, Huang A, Maddison C, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
Steinwart I, Christmann A (2008) Support Vector Machines (Springer, New York).Crossref, Google Scholar
Steinwart I, Scovel C (2007) Fast rates for support vector machines using gaussian kernels. Ann. Statist. 35(2):575–607.Crossref, Google Scholar
Steinwart I, Hush D, Scovel C (2009) Optimal rates for regularized least squares regression. Proc. 22nd Conf. on Learn. Theory.Google Scholar
Telgarsky M (2013) Boosting with the logistic loss is consistent, arXiv preprint arXiv:1305.2648.Google Scholar
Temlyakov V (2008) Greedy approximation. Acta Numerics 17:235–409.Crossref, Google Scholar
Temlyakov V (2015) Greedy approximation in convex optimization. Constructive Approximations 41(2):269–296.Crossref, Google Scholar
Terekhov D, Beck J, Brown K (2009) A constraint programming approach for solving a queueing design and control problem. INFORMS J. Comput. 21(4):549–561.Link, Google Scholar
Tsybakov A (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.Crossref, Google Scholar
Vapnik V (2013) The Nature of Statistical Learning Theory. (Springer Science & Business Media, New York)Google Scholar
Wang Y, Guo X, Lin SB (2020) Kernel-based l2-boosting with structure constraints. Preprint, submitted September 16, https://arxiv.org/abs/2009.07558.Google Scholar
Wang Y, Liao X, Lin S (2019) Rescaled boosting in classification. IEEE Trans. Neural Network Learn. Systems 30(9):2598–2610.Crossref, Google Scholar
Wei Y, Yang F, Wainwright M (2019) Early stopping for kernel boosting algorithms: A general analysis with localized complexities. IEEE Trans. Inform. Theory 65(10):6685–6703.Crossref, Google Scholar
Xiang D (2011) Logistic classification with varying Gaussians. Comput. Math. Appl. 61(2):397–407.Crossref, Google Scholar
Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constructive Approximations 26(2):289–315.Crossref, Google Scholar
Ye Y (1997) Interior Point Algorithms: Theory and Analysis (Wiley-Interscience, New York).Crossref, Google Scholar
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput. Intelligence Magazine 13(3):55–75.Crossref, Google Scholar
Yu Z, Wang Z, You J, Zhang J, Liu J, Wong H, Han G (2017) A new kind of nonparametric test for statistical comparison of multiple classifiers over multiple data sets. IEEE Trans. Cybernetics 47(12):4418–4431.Crossref, Google Scholar
Zhang T (2003) Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inform. Theory 49(3):682–691.Crossref, Google Scholar
Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1):56–85.Crossref, Google Scholar
Zhang T, Yu B (2005) Boosting with early stopping: Convergence and consistency. Ann. Statist. 33(4):1538–1579.Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 34, Issue 6

November-December 2022

Pages 2867-3350, C2

Article Information

Supplemental Material

Metrics

Information

Received:January 16, 2021
Accepted:June 29, 2022
Published Online:August 19, 2022

Cite as

Shao-Bo Lin, Shaojie Tang, Yao Wang, Di Wang (2022) Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications. INFORMS Journal on Computing 34(6):3096-3116.

https://doi.org/10.1287/ijoc.2022.1224

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications

References

Volume 34, Issue 6

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News