Strong Optimal Classification Trees

Published Online:https://doi.org/10.1287/opre.2021.0034

References

  • Aghaei S, Azizi MJ, Vayanos P (2019) Learning optimal and fair decision trees for non-discriminative decision-making. Proc. Conf. AAAI Artificial Intelligence 33:1418–1426.CrossrefGoogle Scholar
  • Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. Proc. AAAI Conf. Artificial Intelligence 34(4):3146–3153.Google Scholar
  • Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1–2):3–39.CrossrefGoogle Scholar
  • Atamturk A, Gomez A, Han S (2021) Sparse and smooth signal estimation: Convexification of l0-formulations. J. Machine Learn. Res. 22(52):1–43.Google Scholar
  • Azizi MJ, Vayanos P, Wilder B, Rice E, Tambe M (2018) Designing fair, efficient, and interpretable policies for prioritizing homeless youth for housing resources. Proc. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (Springer, Berlin), 35–51.Google Scholar
  • Benders JF (1962) Partitioning procedures for solving mixed-variables programming problems. Numerical Math. 4(1):238–252.CrossrefGoogle Scholar
  • Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.CrossrefGoogle Scholar
  • Bertsimas D, Dunn J (2019) Machine Learning Under a Modern Optimization Lens (Dynamic Ideas, Charlestown, MA).Google Scholar
  • Bertsimas D, Stellato B (2021) The voice of optimization. Machine Learn. 110(2):249–277.CrossrefGoogle Scholar
  • Bertsimas D, Van Parys B (2020) Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Ann. Statist. 48(1):300–323.CrossrefGoogle Scholar
  • Bertsimas D, Dunn J, Pawlowski C, Zhuo YD (2019a) Robust classification. INFORMS J. Optim. 1(1):2–34.LinkGoogle Scholar
  • Bertsimas D, Kung J, Trichakis N, Wang Y, Hirose R, Vagefi PA (2019b) Development and validation of an optimized prediction of mortality for candidates awaiting liver transplantation. Amer. J. Transplantation 19(4):1109–1118.CrossrefGoogle Scholar
  • Bienstock D, Muñoz G, Pokutta S (2018) Principled deep neural network training through linear programming. Preprint, submitted October 7, https://arxiv.org/abs/1810.03218.Google Scholar
  • Biggs M, Hariss R (2018) Optimizing objective functions determined from random forests. Preprint, submitted June 16, https://dx.doi.org/10.2139/ssrn.2986630.Google Scholar
  • Bishop CM (2006) Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, Berlin).Google Scholar
  • Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Document Math. Extra Volume ISMP (2012):107–121.Google Scholar
  • Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020) Sparsity in optimal randomized classification trees. Eur. J. Oper. Res. 284(1):255–272.CrossrefGoogle Scholar
  • Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Comput. Oper. Res. 132:105281.CrossrefGoogle Scholar
  • Blurock ES (1995) Automatic learning of chemical concepts: Research octane number and molecular substructures. Comput. Chemistry 19(2):91–99.CrossrefGoogle Scholar
  • Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.CrossrefGoogle Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.CrossrefGoogle Scholar
  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees (Wadsworth and Brooks, Monterey, CA).Google Scholar
  • Carrizosa E, Molero-Río C, Romero Morales D (2021) Mathematical optimization in classification and regression trees. TOP 29(1):5–33.CrossrefGoogle Scholar
  • Chan H, Rice E, Vayanos P, Tambe M, Morton M (2018) From empirical analysis to public policy: Evaluating housing systems for homeless youth. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Berlin), 69–85.Google Scholar
  • Chen X, Zhu CC, Yin J (2019) Ensemble of decision tree reveals potential mirna-disease associations. PLOS Comput. Biology 15(7):e1007209.CrossrefGoogle Scholar
  • Ciocan DF, Mišić VV (2022) Interpretable optimal stopping. Management Sci. 68(3):1616–1638.LinkGoogle Scholar
  • CPLEX II (2009) V12. 1: User’s manual for CPLEX. Internat. Bus. Machines Corporation 46(53):157.Google Scholar
  • Demirović E, Stuckey PJ (2021) Optimal decision trees for nonlinear metrics. Proc. Conf. AAAI Artificial Intelligence 35:3733–3741.CrossrefGoogle Scholar
  • Demirović E, Lukina A, Hebrard E, Chan J, Bailey J, Leckie C, Ramamohanarao K, et al. (2020) Murtree: Optimal classification trees via dynamic programming and search. Preprint, submitted July 24, https://arxiv.org/abs/2007.12652.Google Scholar
  • Detassis F, Lombardi M, Milano M (2020) Teaching the old dog new tricks: Supervised learning with constraints. Preprint, submitted February 25, https://arxiv.org/abs/2002.10766.Google Scholar
  • Dong H, Chen K, Linderoth J (2015) Regularization vs. relaxation: A conic optimization perspective of statistical variable selection. Preprint, submitted October 20, https://arxiv.org/abs/1510.06083.Google Scholar
  • Dua D, Graff C (2017) UCI machine learning repository. Accessed September 1, 2019, http://archive.ics.uci.edu/ml.Google Scholar
  • Gade D, Küçükyavuz S, Sen S (2014) Decomposition algorithms with parametric gomory cuts for two-stage stochastic integer programs. Math. Programming 144(1–2):39–64.CrossrefGoogle Scholar
  • Gangammanavar H, Liu Y, Sen S (2021) Stochastic decomposition for two-stage stochastic linear programs with random cost coefficients. INFORMS J. Comput. 33(1):51–71.LinkGoogle Scholar
  • Goldberg AV, Tarjan RE (1988) A new approach to the maximum-flow problem. J. ACM 35(4):921–940.CrossrefGoogle Scholar
  • Gómez A (2021) Outlier detection in time series via mixed-integer conic quadratic optimization. SIAM J. Optim. 31(3):1897–1925.CrossrefGoogle Scholar
  • Günlük O, Kalagnanam J, Li M, Menickelly M, Scheinberg K (2021) Optimal decision trees for categorical data via integer programming. J. Global Optim. 81(1):233–260.CrossrefGoogle Scholar
  • Guo C, Bodur M, Aleman DM, Urbach DR (2021) Logic-based benders decomposition and binary decision diagram based approaches for stochastic distributed operating room scheduling. INFORMS J. Comput. 33(4):1551–1569.AbstractGoogle Scholar
  • Gurobi I (2015) Gurobi optimizer reference manual. Accessed September 1, 2019, http://www.gurobi.com.Google Scholar
  • Hazimeh H, Mazumder R, Saab A (2022) Sparse regression at scale: Branch-and-bound rooted in first-order optimization. Math. Programming 196(1–2):347–388.CrossrefGoogle Scholar
  • Hochbaum DS (2008) The pseudoflow algorithm: A new algorithm for the maximum-flow problem. Oper. Res. 56(4):992–1009.LinkGoogle Scholar
  • Hu TC (1963) Multi-commodity network flows. Oper. Res. 11(3):344–360.LinkGoogle Scholar
  • Hu X, Rudin C, Seltzer M (2019) Optimal sparse decision trees. Adv. Neural Inform. Processing Systems 32:7267–7275.Google Scholar
  • Hyafil L, Rivest RL (1976) Constructing optimal binary search trees is NP complete. Inform. Processing Lett. 5(1):15–17.CrossrefGoogle Scholar
  • Intrator J, Allan E, Palmer M (1992) Decision tree for the management of substance-abusing psychiatric patients. J. Substance Abuse Treatment 9(3):215–220.CrossrefGoogle Scholar
  • Jo N, Aghaei S, Gómez A, Vayanos P (2021) Learning optimal prescriptive trees from observational data. Preprint, submitted August 31, https://arxiv.org/abs/2108.13628.Google Scholar
  • Justin N, Aghaei S, Gómez A, Vayanos P (2022) Optimal robust classification trees. Proc. 36th AAAI Conf. Artificial Intelligence Workshop Adversarial Machine Learn. Beyond (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Medical Inform. Decision Making 11(1):51.CrossrefGoogle Scholar
  • Kirschbaum D, Adler R, Hong Y, Lerner-Lam A (2009) Evaluation of a preliminary satellite-based landslide hazard algorithm using global landslide inventories. Natural Hazards Earth Systems Sci. 9(3):673–686.CrossrefGoogle Scholar
  • Kuhn M, Weston S, Culp M, Coulter N, Quinlan R (2023a) C50: C5.0 Decision trees and rule-based models. Accessed July 25, 2024, https://CRAN.R-project.org/package=C50.Google Scholar
  • Kuhn M, Weston S, Culp M, Coulter N, Quinlan R (2023b) rpart: Recursive partitioning and regression trees, R package version 4.1.23. Accessed July 25, 2024, https://CRAN.R-project.org/package=rpart.Google Scholar
  • Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22.Google Scholar
  • Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 6150–6160.Google Scholar
  • Liu J, Sen S (2020) Asymptotic results of stochastic decomposition for two-stage stochastic quadratic programming. SIAM J. Optim. 30(1):823–852.CrossrefGoogle Scholar
  • Liu X, Küçükyavuz S, Luedtke J (2016) Decomposition algorithms for two-stage chance-constrained programs. Math. Programming 157(1):219–243.CrossrefGoogle Scholar
  • Lombardi M, Baldo F, Borghesi A, Milano M (2020) An analysis of regularized approaches for constrained machine learning. Preprint, submitted May 20, https://arxiv.org/abs/2005.10674.Google Scholar
  • Lozano L, Smith JC (2022) A binary decision diagram based algorithm for solving a class of binary two-stage stochastic programs. Math. Programming 191(1):381–404.CrossrefGoogle Scholar
  • MacNeil M, Bodur M (2022) Integer programming, constraint programming, and hybrid decomposition approaches to discretizable distance geometry problems. INFORMS J. Comput. 34(1):297–314.LinkGoogle Scholar
  • Magnanti TL, Wong RT (1981) Accelerating benders decomposition: Algorithmic enhancement and model selection criteria. Oper. Res. 29(3):464–484.LinkGoogle Scholar
  • Mišić VV (2020) Optimization of tree ensembles. Oper. Res. 68(5):1605–1624.LinkGoogle Scholar
  • Mower JP (2005) Prep-Mt: Predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics 6(1):96.CrossrefGoogle Scholar
  • Narodytska N, Ignatiev A, Pereira F, Marques-Silva J (2018) Learning optimal decision trees with SAT. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1362–1368.Google Scholar
  • Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Mining Knowledge Discovery 21(1):9–51.CrossrefGoogle Scholar
  • Okada S, Ohzeki M, Taguchi S (2019) Efficient partition of integer optimization problems with one-hot encoding. Sci. Rep. 9(1):1–12.CrossrefGoogle Scholar
  • Olanow CW, Watts RL, Koller WC (2001) An algorithm (decision tree) for the management of Parkinson’s disease (2001): Treatment guidelines. Neurology 56(suppl 5):S1–S88.CrossrefGoogle Scholar
  • Quinlan JR (1986) Induction of decision trees. Machine Learn. 1(1):81–106.CrossrefGoogle Scholar
  • Quinlan JR (2014) C4. 5: Programs for Machine Learning (Elsevier, New York).Google Scholar
  • Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Natural Machine Intelligence 1(5):206–215.CrossrefGoogle Scholar
  • Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2019) Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomedical Signal Processing Control 52:456–462.CrossrefGoogle Scholar
  • Vazirani VV (2013) Approximation Algorithms (Springer Science & Business Media, Boston).Google Scholar
  • Verhaeghe H, Nijssen S, Pesant G, Quimper CG, Schaus P (2020) Learning optimal decision trees using constraint programming. Constraints 25:226–250.Google Scholar
  • Verwer S, Zhang Y (2019) Learning optimal classification trees using a binary linear program formulation. Proc. Conf. AAAI Artificial Intelligence 33:1625–1632.CrossrefGoogle Scholar
  • Vos D, Verwer S (2022) Robust optimal classification trees against adversarial examples. Proc. AAAI Conf. Artificial Intelligence 36(8):8520–8528.Google Scholar
  • Wei W, Li J, Cao L, Ou Y, Chen J (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web (Bussum) 16(4):449–475.CrossrefGoogle Scholar
  • Xie W, Deng X (2020) Scalable algorithms for the sparse ridge regression. SIAM J. Optim. 30(4):3359–3386.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.