Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees

Published Online:https://doi.org/10.1287/ijoc.2023.0068

References

  • Aghaei S, Azizi MJ, Vayanos P (2019) Learning optimal and fair decision trees for non-discriminative decision-making. Proc. 33rd AAAI Conf. Artificial Intelligence, 31st Innovative Appl. Artificial Intelligence Conf., 9th AAAI Sympos. Ed. Adv. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 1418–1426.Google Scholar
  • Aghaei S, Gómez A, Vayanos P (2025) Strong optimal classification trees. Oper. Res. 73(4):2223–2241.LinkGoogle Scholar
  • Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. Proc. AAAI Conf. Artificial Intelligence 34(04):3146–3153.CrossrefGoogle Scholar
  • Alston B (2024) Bridging the gap between operations research and machine learning with decision trees and neural nets. PhD thesis, Rice University, Houston, TX.Google Scholar
  • Alston B, Validi H, Hicks I (2026) Mixed integer linear optimization formulations for learning optimal binary classification trees. https://doi.org/10.1287/ijoc.2023.0068.cd, https://github.com/INFORMSJoC/2023.0068.Google Scholar
  • Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2017) Learning certifiably optimal rule lists. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 35–44.Google Scholar
  • Balestriero R (2017) Neural decision trees. Preprint, submitted March 6, https://arxiv.org/abs/1702.07360.Google Scholar
  • Balk B, Elder K (2000) Combining binary decision tree and geostatistical methods to estimate snow distribution in a mountain watershed. Water Resources Res. 36(1):13–26.CrossrefGoogle Scholar
  • Barata AP, Takes FW, van den Herik HJ, Veenman CJ (2024) Fair tree learning. Machine Learn. 113(5):3305–3324.Google Scholar
  • Bennett K (1995) Global tree optimization: A non-greedy decision tree algorithm. Comput. Sci. Statist. 26(1):156–156.Google Scholar
  • Bennett KP, Blue JA (1996) Optimal decision trees. Rensselaer Polytechnic Inst. Math Rep. 214(24):1–28.Google Scholar
  • Bennett K, Mangasarian O (1993) Bilinear separation of two sets inn-space. Comput. Optim. Appl. 2(3):207–227.CrossrefGoogle Scholar
  • Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.CrossrefGoogle Scholar
  • Bertsimas D, Shioda R (2007) Classification and regression via integer optimization. Oper. Res. 55(2):252–271.LinkGoogle Scholar
  • Bertsimas D, Dunn J, Mundru N (2019) Optimal prescriptive trees. INFORMS J. Optim. 1(2):164–183.LinkGoogle Scholar
  • Bixby R (2012) A brief history of linear and mixed-integer programming computation. Documenta Math. 107–121.Google Scholar
  • Bixby ER, Fenelon M, Gu Z, Rothberg E, Wunderling R (1999) MIP: Theory and practice—Closing the gap. System Modelling and Optimization (Springer, Berlin), 19–49.Google Scholar
  • Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Computers Oper. Res. 132(C):105281.CrossrefGoogle Scholar
  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. Proc. 5th Annual Workshop Comput. Learn. Theory (Association for Computing Machinery, New York), 144–152.CrossrefGoogle Scholar
  • Boutilier JJ, Michini C, Zhou Z (2022) Shattering inequalities for learning optimal decision trees. Integration of Constraint Programming, Artificial Intelligence, and Operations Research (Springer, Berlin), 74–90.CrossrefGoogle Scholar
  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
  • Brodley C, Utgoff P (1995) Multivariate decision trees. Machine Learn. 19:45–77.CrossrefGoogle Scholar
  • Cornuéjols G (2001) Combinatorial Optimization Packing and Covering, CBMS-NSF Regional Conference Series in Applied Mathematics (Society for Industrial and Applied Mathematics, Philadelphia).Google Scholar
  • Cornuéjols G (2007) Revival of the Gomory cuts in the 1990’s. Ann. Oper. Res. 149(1):63–66.CrossrefGoogle Scholar
  • Cornuéjols G (2008) Valid inequalities for mixed integer linear programs. Math. Programming 112(1):3–44.CrossrefGoogle Scholar
  • Dash S, Günlük O, Wei D (2018) Boolean decision rules via column generation. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 4660–4670.Google Scholar
  • Delen D, Kuzey C, Uyar A (2013) Measuring firm performance using financial ratios: A decision tree approach. Expert Systems Appl. 40(10):3970–3983.CrossrefGoogle Scholar
  • Demirović E, Lukina A, Hebrard E, Chan J, Bailey J, Leckie C, Ramamohanarao K, et al. (2022) MurTree: Optimal decision trees via dynamic programming and search. J. Machine Learn. Res. 23(26):1–47.Google Scholar
  • Espinoza D (2018) Avoiding numerical issues in optimization models. Accessed February 1, 2025, https://www.gurobi.com/events/avoiding-numerical-issues-in-optimization-models/.Google Scholar
  • Firat M, Crognier G, Gabor AF, Hurkens C, Zhang Y (2020) Column generation based heuristic for learning classification trees. Computers Oper. Res. 116(1):104866.CrossrefGoogle Scholar
  • Fischetti M, Leitner M, Ljubić I, Luipersbeck M, Monaci M, Resch M, Salvagnin D, et al. (2017) Thinning out Steiner trees: A node-based model for uniform edge costs. Math. Programming Comput. 9(2):203–229.CrossrefGoogle Scholar
  • Ford LR, Fulkerson DR (1962) Flows in Networks (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Fu R, Huang Y, Singh PV (2020) Artificial intelligence and algorithmic bias: Source, detection, mitigation, and implications. INFORMS Tutorials in Operations Research, 39–63.Google Scholar
  • Günlük O, Kalagnanam J, Li M, Menickelly M, Scheinberg K (2021) Optimal decision trees for categorical data via integer programming. J. Global Optim. 81(1):233–260.CrossrefGoogle Scholar
  • Hochbaum DS (2018) Machine learning and data mining with combinatorial optimization algorithms. INFORMS Tutorials in Operations Research, 109–129.Google Scholar
  • Hooker JN, Ottosson G (2003) Logic-based benders decomposition. Math. Programming 96(1):33–60.CrossrefGoogle Scholar
  • Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inform. Processing Lett. 5(1):15–17.CrossrefGoogle Scholar
  • Jans R, Desrosiers J (2013) Efficient symmetry breaking formulations for the job grouping problem. Comput. Oper. Res. 40(4):1132–1142.CrossrefGoogle Scholar
  • Jo N, Aghaei S, Benson J, Gómez A, Vayanos P (2022) Learning optimal fair classification trees. Proc. AAAI/ACM Conf. AI, Ethics, Society, AIES ’23 (ACM, New York), 181–192.Google Scholar
  • Kaplan H, Nussbaum Y (2011) Minimum s−t cut in undirected planar graphs when the source and the sink are close. Proc. Sympos. Theoretical Aspects Comput. Sci., vol. 9 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany), 117–128.Google Scholar
  • Kontschieder P, Fiterau M, Criminisi A, Bulò SR (2015) Deep neural decision forests. Proc. IEEE Internat. Conf. Comput. Vision (AAAI Press, Palo Alto, CA), 1467–1475.Google Scholar
  • Kumar A, Hanmandlu M, Gupta H (2013) Fuzzy binary decision tree for biometric based personal authentication. Neurocomputing 99:87–97.CrossrefGoogle Scholar
  • Land A, Doig A (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520.CrossrefGoogle Scholar
  • Lee G, Jaakkola TS (2020) Oblique decision trees from derivatives of ReLu networks. Preprint, submitted May 3, https://doi.org/10.48550/arXiv.1909.13488.Google Scholar
  • Lenstra HW (1987) Factoring integers with elliptic curves. Ann. Math. 126(3):649–673.CrossrefGoogle Scholar
  • Li Z, Wang L, Huang L, Zhang M, Cai X, Xu F, Wu F, et al. (2021) Efficient management strategy of COVID-19 patients based on cluster analysis and clinical decision tree classification. Sci. Rep. 11(1):9626.CrossrefGoogle Scholar
  • Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. Proc. 37th Internat. Conf. Machine Learn. (JMLR, New York), 6150–6160.Google Scholar
  • Magee JF (1964) Decision Trees for Decision Making (Harvard Business Review, Cambridge, MA).Google Scholar
  • Malik N, Singh PV (2019) Deep learning in computer vision: Methods, interpretation, causation, and fairness. INFORMS Tutorials in Operations Research, 73–100.Google Scholar
  • Margot F (2010) Symmetry in Integer Linear Programming (Springer, Berlin), 647–686.CrossrefGoogle Scholar
  • McTavish H, Zhong C, Achermann R, Karimalis I, Chen J, Rudin C, Seltzer M (2022) Fast sparse decision tree optimization via reference ensembles. Proc. AAAI Conf. Artificial Intelligence 36(9):9604–9613.CrossrefGoogle Scholar
  • Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. Machine Learning and Knowledge Discovery in Databases (Springer, Berlin), 453–469.CrossrefGoogle Scholar
  • Murthy SK (1998) Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining Knowledge Discovery 2(4):345–389.CrossrefGoogle Scholar
  • Murthy SK, Kasif S, Salzberg SL (1994) A system for induction of decision trees. J. Artificial Intelligence Res. 2(1):1–32.CrossrefGoogle Scholar
  • Narodytska N, Ignatiev A, Pereira F, Marques-Silva J (2018) Learning optimal decision trees with sat. Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1362–1368.CrossrefGoogle Scholar
  • Orsenigo C, Vercellis C (2003) Multivariate classification trees based on minimum features discrete support vector machines. IMA J. Management Math. 14(3):221–234.Google Scholar
  • Patel KK, Desaulniers G, Lodi A (2024) An improved column-generation-based matheuristic for learning classification trees. Comput. Oper. Res. 165(C):106579.CrossrefGoogle Scholar
  • Rl M, Mishra AK (2021) Measuring financial performance of Indian manufacturing firms: Application of decision tree algorithms. Measuring Bus. Excellence 26(3):288–307.Google Scholar
  • Valdivia A, Sánchez‐Monedero J, Casillas J (2021) How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness. Internat. J. Intelligent Systems 36(4):1619–1643.CrossrefGoogle Scholar
  • Verwer S, Zhang Y (2019) Learning optimal classification trees using a binary linear program formulation. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 1625–1632.CrossrefGoogle Scholar
  • Wang J, Fujimaki R, Motohashi Y (2015) Trading interpretability for accuracy: Oblique treed sparse additive models. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining. (ACM, New York), 1245–1254.Google Scholar
  • Wickramarachchi D, Robertson B, Reale M, Price C, Brown J (2016) HHCART: An oblique decision tree. Comput. Statist. Data Anal. (Oxford) 96(C):12–23.CrossrefGoogle Scholar
  • Yang Y, Morillo IG, Hospedales TM (2018) Deep neural decision trees. Preprint, submitted June 19, https://arxiv.org/abs/1806.06988.Google Scholar
  • Zantedeschi V, Kusner MJ, Niculae V (2020) Learning binary decision trees by argmin differentiation. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 12298–12309.Google Scholar
  • Zhang W, Ntoutsi E (2019) FAHT: An adaptive fairness-aware decision tree classifier. Kraus S, ed. Proc. 28th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1480–1486.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.