Strong Optimal Classification Trees

Sina Aghaei
Corresponding Author
Sina Aghaei
[email protected]
https://orcid.org/0000-0002-3394-8864
Center for Artificial Intelligence in Society, University of Southern California, Los Angeles, California 90089
Search for more papers by this author
,
Andrés Gómez
Andrés Gómez
[email protected]
https://orcid.org/0000-0003-3668-0653
Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089
Search for more papers by this author
,
Phebe Vayanos
Phebe Vayanos
[email protected]
https://orcid.org/0000-0001-7800-7235
Center for Artificial Intelligence in Society, University of Southern California, Los Angeles, California 90089
Search for more papers by this author

Sina Aghaei

Corresponding Author

Sina Aghaei

[email protected]

https://orcid.org/0000-0002-3394-8864

Center for Artificial Intelligence in Society, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Andrés Gómez

[email protected]

https://orcid.org/0000-0003-3668-0653

Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Phebe Vayanos

[email protected]

https://orcid.org/0000-0001-7800-7235

Center for Artificial Intelligence in Society, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Published Online:31 Jul 2024https://doi.org/10.1287/opre.2021.0034

References

Aghaei S, Azizi MJ, Vayanos P (2019) Learning optimal and fair decision trees for non-discriminative decision-making. Proc. Conf. AAAI Artificial Intelligence 33:1418–1426.Crossref, Google Scholar
Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. Proc. AAAI Conf. Artificial Intelligence 34(4):3146–3153.Google Scholar
Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1–2):3–39.Crossref, Google Scholar
Atamturk A, Gomez A, Han S (2021) Sparse and smooth signal estimation: Convexification of l0-formulations. J. Machine Learn. Res. 22(52):1–43.Google Scholar
Azizi MJ, Vayanos P, Wilder B, Rice E, Tambe M (2018) Designing fair, efficient, and interpretable policies for prioritizing homeless youth for housing resources. Proc. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (Springer, Berlin), 35–51.Google Scholar
Benders JF (1962) Partitioning procedures for solving mixed-variables programming problems. Numerical Math. 4(1):238–252.Crossref, Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.Crossref, Google Scholar
Bertsimas D, Dunn J (2019) Machine Learning Under a Modern Optimization Lens (Dynamic Ideas, Charlestown, MA).Google Scholar
Bertsimas D, Stellato B (2021) The voice of optimization. Machine Learn. 110(2):249–277.Crossref, Google Scholar
Bertsimas D, Van Parys B (2020) Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Ann. Statist. 48(1):300–323.Crossref, Google Scholar
Bertsimas D, Dunn J, Pawlowski C, Zhuo YD (2019a) Robust classification. INFORMS J. Optim. 1(1):2–34.Link, Google Scholar
Bertsimas D, Kung J, Trichakis N, Wang Y, Hirose R, Vagefi PA (2019b) Development and validation of an optimized prediction of mortality for candidates awaiting liver transplantation. Amer. J. Transplantation 19(4):1109–1118.Crossref, Google Scholar
Bienstock D, Muñoz G, Pokutta S (2018) Principled deep neural network training through linear programming. Preprint, submitted October 7, https://arxiv.org/abs/1810.03218.Google Scholar
Biggs M, Hariss R (2018) Optimizing objective functions determined from random forests. Preprint, submitted June 16, https://dx.doi.org/10.2139/ssrn.2986630.Google Scholar
Bishop CM (2006) Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, Berlin).Google Scholar
Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Document Math. Extra Volume ISMP (2012):107–121.Google Scholar
Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020) Sparsity in optimal randomized classification trees. Eur. J. Oper. Res. 284(1):255–272.Crossref, Google Scholar
Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Comput. Oper. Res. 132:105281.Crossref, Google Scholar
Blurock ES (1995) Automatic learning of chemical concepts: Research octane number and molecular substructures. Comput. Chemistry 19(2):91–99.Crossref, Google Scholar
Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.Crossref, Google Scholar
Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees (Wadsworth and Brooks, Monterey, CA).Google Scholar
Carrizosa E, Molero-Río C, Romero Morales D (2021) Mathematical optimization in classification and regression trees. TOP 29(1):5–33.Crossref, Google Scholar
Chan H, Rice E, Vayanos P, Tambe M, Morton M (2018) From empirical analysis to public policy: Evaluating housing systems for homeless youth. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Berlin), 69–85.Google Scholar
Chen X, Zhu CC, Yin J (2019) Ensemble of decision tree reveals potential mirna-disease associations. PLOS Comput. Biology 15(7):e1007209.Crossref, Google Scholar
Ciocan DF, Mišić VV (2022) Interpretable optimal stopping. Management Sci. 68(3):1616–1638.Link, Google Scholar
CPLEX II (2009) V12. 1: User’s manual for CPLEX. Internat. Bus. Machines Corporation 46(53):157.Google Scholar
Demirović E, Stuckey PJ (2021) Optimal decision trees for nonlinear metrics. Proc. Conf. AAAI Artificial Intelligence 35:3733–3741.Crossref, Google Scholar
Demirović E, Lukina A, Hebrard E, Chan J, Bailey J, Leckie C, Ramamohanarao K, et al. (2020) Murtree: Optimal classification trees via dynamic programming and search. Preprint, submitted July 24, https://arxiv.org/abs/2007.12652.Google Scholar
Detassis F, Lombardi M, Milano M (2020) Teaching the old dog new tricks: Supervised learning with constraints. Preprint, submitted February 25, https://arxiv.org/abs/2002.10766.Google Scholar
Dong H, Chen K, Linderoth J (2015) Regularization vs. relaxation: A conic optimization perspective of statistical variable selection. Preprint, submitted October 20, https://arxiv.org/abs/1510.06083.Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. Accessed September 1, 2019, http://archive.ics.uci.edu/ml.Google Scholar
Gade D, Küçükyavuz S, Sen S (2014) Decomposition algorithms with parametric gomory cuts for two-stage stochastic integer programs. Math. Programming 144(1–2):39–64.Crossref, Google Scholar
Gangammanavar H, Liu Y, Sen S (2021) Stochastic decomposition for two-stage stochastic linear programs with random cost coefficients. INFORMS J. Comput. 33(1):51–71.Link, Google Scholar
Goldberg AV, Tarjan RE (1988) A new approach to the maximum-flow problem. J. ACM 35(4):921–940.Crossref, Google Scholar
Gómez A (2021) Outlier detection in time series via mixed-integer conic quadratic optimization. SIAM J. Optim. 31(3):1897–1925.Crossref, Google Scholar
Günlük O, Kalagnanam J, Li M, Menickelly M, Scheinberg K (2021) Optimal decision trees for categorical data via integer programming. J. Global Optim. 81(1):233–260.Crossref, Google Scholar
Guo C, Bodur M, Aleman DM, Urbach DR (2021) Logic-based benders decomposition and binary decision diagram based approaches for stochastic distributed operating room scheduling. INFORMS J. Comput. 33(4):1551–1569.Abstract, Google Scholar
Gurobi I (2015) Gurobi optimizer reference manual. Accessed September 1, 2019, http://www.gurobi.com.Google Scholar
Hazimeh H, Mazumder R, Saab A (2022) Sparse regression at scale: Branch-and-bound rooted in first-order optimization. Math. Programming 196(1–2):347–388.Crossref, Google Scholar
Hochbaum DS (2008) The pseudoflow algorithm: A new algorithm for the maximum-flow problem. Oper. Res. 56(4):992–1009.Link, Google Scholar
Hu TC (1963) Multi-commodity network flows. Oper. Res. 11(3):344–360.Link, Google Scholar
Hu X, Rudin C, Seltzer M (2019) Optimal sparse decision trees. Adv. Neural Inform. Processing Systems 32:7267–7275.Google Scholar
Hyafil L, Rivest RL (1976) Constructing optimal binary search trees is NP complete. Inform. Processing Lett. 5(1):15–17.Crossref, Google Scholar
Intrator J, Allan E, Palmer M (1992) Decision tree for the management of substance-abusing psychiatric patients. J. Substance Abuse Treatment 9(3):215–220.Crossref, Google Scholar
Jo N, Aghaei S, Gómez A, Vayanos P (2021) Learning optimal prescriptive trees from observational data. Preprint, submitted August 31, https://arxiv.org/abs/2108.13628.Google Scholar
Justin N, Aghaei S, Gómez A, Vayanos P (2022) Optimal robust classification trees. Proc. 36th AAAI Conf. Artificial Intelligence Workshop Adversarial Machine Learn. Beyond (Curran Associates Inc., Red Hook, NY).Google Scholar
Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Medical Inform. Decision Making 11(1):51.Crossref, Google Scholar
Kirschbaum D, Adler R, Hong Y, Lerner-Lam A (2009) Evaluation of a preliminary satellite-based landslide hazard algorithm using global landslide inventories. Natural Hazards Earth Systems Sci. 9(3):673–686.Crossref, Google Scholar
Kuhn M, Weston S, Culp M, Coulter N, Quinlan R (2023a) C50: C5.0 Decision trees and rule-based models. Accessed July 25, 2024, https://CRAN.R-project.org/package=C50.Google Scholar
Kuhn M, Weston S, Culp M, Coulter N, Quinlan R (2023b) rpart: Recursive partitioning and regression trees, R package version 4.1.23. Accessed July 25, 2024, https://CRAN.R-project.org/package=rpart.Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22.Google Scholar
Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 6150–6160.Google Scholar
Liu J, Sen S (2020) Asymptotic results of stochastic decomposition for two-stage stochastic quadratic programming. SIAM J. Optim. 30(1):823–852.Crossref, Google Scholar
Liu X, Küçükyavuz S, Luedtke J (2016) Decomposition algorithms for two-stage chance-constrained programs. Math. Programming 157(1):219–243.Crossref, Google Scholar
Lombardi M, Baldo F, Borghesi A, Milano M (2020) An analysis of regularized approaches for constrained machine learning. Preprint, submitted May 20, https://arxiv.org/abs/2005.10674.Google Scholar
Lozano L, Smith JC (2022) A binary decision diagram based algorithm for solving a class of binary two-stage stochastic programs. Math. Programming 191(1):381–404.Crossref, Google Scholar
MacNeil M, Bodur M (2022) Integer programming, constraint programming, and hybrid decomposition approaches to discretizable distance geometry problems. INFORMS J. Comput. 34(1):297–314.Link, Google Scholar
Magnanti TL, Wong RT (1981) Accelerating benders decomposition: Algorithmic enhancement and model selection criteria. Oper. Res. 29(3):464–484.Link, Google Scholar
Mišić VV (2020) Optimization of tree ensembles. Oper. Res. 68(5):1605–1624.Link, Google Scholar
Mower JP (2005) Prep-Mt: Predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics 6(1):96.Crossref, Google Scholar
Narodytska N, Ignatiev A, Pereira F, Marques-Silva J (2018) Learning optimal decision trees with SAT. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1362–1368.Google Scholar
Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Mining Knowledge Discovery 21(1):9–51.Crossref, Google Scholar
Okada S, Ohzeki M, Taguchi S (2019) Efficient partition of integer optimization problems with one-hot encoding. Sci. Rep. 9(1):1–12.Crossref, Google Scholar
Olanow CW, Watts RL, Koller WC (2001) An algorithm (decision tree) for the management of Parkinson’s disease (2001): Treatment guidelines. Neurology 56(suppl 5):S1–S88.Crossref, Google Scholar
Quinlan JR (1986) Induction of decision trees. Machine Learn. 1(1):81–106.Crossref, Google Scholar
Quinlan JR (2014) C4. 5: Programs for Machine Learning (Elsevier, New York).Google Scholar
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Natural Machine Intelligence 1(5):206–215.Crossref, Google Scholar
Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2019) Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomedical Signal Processing Control 52:456–462.Crossref, Google Scholar
Vazirani VV (2013) Approximation Algorithms (Springer Science & Business Media, Boston).Google Scholar
Verhaeghe H, Nijssen S, Pesant G, Quimper CG, Schaus P (2020) Learning optimal decision trees using constraint programming. Constraints 25:226–250.Google Scholar
Verwer S, Zhang Y (2019) Learning optimal classification trees using a binary linear program formulation. Proc. Conf. AAAI Artificial Intelligence 33:1625–1632.Crossref, Google Scholar
Vos D, Verwer S (2022) Robust optimal classification trees against adversarial examples. Proc. AAAI Conf. Artificial Intelligence 36(8):8520–8528.Google Scholar
Wei W, Li J, Cao L, Ou Y, Chen J (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web (Bussum) 16(4):449–475.Crossref, Google Scholar
Xie W, Deng X (2020) Scalable algorithms for the sparse ridge regression. SIAM J. Optim. 30(4):3359–3386.Crossref, Google Scholar

Volume 73, Issue 4

July-August 2025

Pages iii-viii, 1723-2295, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:January 15, 2021
Accepted:January 11, 2024
Published Online:July 31, 2024

Cite as

Sina Aghaei, Andrés Gómez, Phebe Vayanos (2024) Strong Optimal Classification Trees. Operations Research 73(4):2223-2241.

https://doi.org/10.1287/opre.2021.0034

Keywords

Acknowledgments

The authors thank the anonymous reviewers for valuable and constructive feedback that improved the quality of this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Strong Optimal Classification Trees

References

Volume 73, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News