Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees

Brandon C. Alston
Corresponding Author
Brandon C. Alston
[email protected]
https://orcid.org/0000-0003-3199-125X
Computational Applied Mathematics and Operations Research, Rice University, Houston, Texas 77005
Search for more papers by this author
,
Hamidreza Validi
Hamidreza Validi
[email protected]
https://orcid.org/0000-0002-7983-7262
Industrial, Manufacturing & Systems Engineering, Texas Tech University, Lubbock, Texas 79409
Search for more papers by this author
,
Illya V. Hicks
Illya V. Hicks
[email protected]
https://orcid.org/0000-0002-7815-796X
Computational Applied Mathematics and Operations Research, Rice University, Houston, Texas 77005
Search for more papers by this author

Brandon C. Alston

Corresponding Author

Brandon C. Alston

[email protected]

https://orcid.org/0000-0003-3199-125X

Computational Applied Mathematics and Operations Research, Rice University, Houston, Texas 77005

Search for more papers by this author

Hamidreza Validi

[email protected]

https://orcid.org/0000-0002-7983-7262

Industrial, Manufacturing & Systems Engineering, Texas Tech University, Lubbock, Texas 79409

Search for more papers by this author

Illya V. Hicks

[email protected]

https://orcid.org/0000-0002-7815-796X

Computational Applied Mathematics and Operations Research, Rice University, Houston, Texas 77005

Search for more papers by this author

Published Online:12 Mar 2026https://doi.org/10.1287/ijoc.2023.0068

References

Aghaei S, Azizi MJ, Vayanos P (2019) Learning optimal and fair decision trees for non-discriminative decision-making. Proc. 33rd AAAI Conf. Artificial Intelligence, 31st Innovative Appl. Artificial Intelligence Conf., 9th AAAI Sympos. Ed. Adv. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 1418–1426.Google Scholar
Aghaei S, Gómez A, Vayanos P (2025) Strong optimal classification trees. Oper. Res. 73(4):2223–2241.Link, Google Scholar
Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. Proc. AAAI Conf. Artificial Intelligence 34(04):3146–3153.Crossref, Google Scholar
Alston B (2024) Bridging the gap between operations research and machine learning with decision trees and neural nets. PhD thesis, Rice University, Houston, TX.Google Scholar
Alston B, Validi H, Hicks I (2026) Mixed integer linear optimization formulations for learning optimal binary classification trees. https://doi.org/10.1287/ijoc.2023.0068.cd, https://github.com/INFORMSJoC/2023.0068.Google Scholar
Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2017) Learning certifiably optimal rule lists. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 35–44.Google Scholar
Balestriero R (2017) Neural decision trees. Preprint, submitted March 6, https://arxiv.org/abs/1702.07360.Google Scholar
Balk B, Elder K (2000) Combining binary decision tree and geostatistical methods to estimate snow distribution in a mountain watershed. Water Resources Res. 36(1):13–26.Crossref, Google Scholar
Barata AP, Takes FW, van den Herik HJ, Veenman CJ (2024) Fair tree learning. Machine Learn. 113(5):3305–3324.Google Scholar
Bennett K (1995) Global tree optimization: A non-greedy decision tree algorithm. Comput. Sci. Statist. 26(1):156–156.Google Scholar
Bennett KP, Blue JA (1996) Optimal decision trees. Rensselaer Polytechnic Inst. Math Rep. 214(24):1–28.Google Scholar
Bennett K, Mangasarian O (1993) Bilinear separation of two sets inn-space. Comput. Optim. Appl. 2(3):207–227.Crossref, Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.Crossref, Google Scholar
Bertsimas D, Shioda R (2007) Classification and regression via integer optimization. Oper. Res. 55(2):252–271.Link, Google Scholar
Bertsimas D, Dunn J, Mundru N (2019) Optimal prescriptive trees. INFORMS J. Optim. 1(2):164–183.Link, Google Scholar
Bixby R (2012) A brief history of linear and mixed-integer programming computation. Documenta Math. 107–121.Google Scholar
Bixby ER, Fenelon M, Gu Z, Rothberg E, Wunderling R (1999) MIP: Theory and practice—Closing the gap. System Modelling and Optimization (Springer, Berlin), 19–49.Google Scholar
Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Computers Oper. Res. 132(C):105281.Crossref, Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. Proc. 5th Annual Workshop Comput. Learn. Theory (Association for Computing Machinery, New York), 144–152.Crossref, Google Scholar
Boutilier JJ, Michini C, Zhou Z (2022) Shattering inequalities for learning optimal decision trees. Integration of Constraint Programming, Artificial Intelligence, and Operations Research (Springer, Berlin), 74–90.Crossref, Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
Brodley C, Utgoff P (1995) Multivariate decision trees. Machine Learn. 19:45–77.Crossref, Google Scholar
Cornuéjols G (2001) Combinatorial Optimization Packing and Covering, CBMS-NSF Regional Conference Series in Applied Mathematics (Society for Industrial and Applied Mathematics, Philadelphia).Google Scholar
Cornuéjols G (2007) Revival of the Gomory cuts in the 1990’s. Ann. Oper. Res. 149(1):63–66.Crossref, Google Scholar
Cornuéjols G (2008) Valid inequalities for mixed integer linear programs. Math. Programming 112(1):3–44.Crossref, Google Scholar
Dash S, Günlük O, Wei D (2018) Boolean decision rules via column generation. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 4660–4670.Google Scholar
Delen D, Kuzey C, Uyar A (2013) Measuring firm performance using financial ratios: A decision tree approach. Expert Systems Appl. 40(10):3970–3983.Crossref, Google Scholar
Demirović E, Lukina A, Hebrard E, Chan J, Bailey J, Leckie C, Ramamohanarao K, et al. (2022) MurTree: Optimal decision trees via dynamic programming and search. J. Machine Learn. Res. 23(26):1–47.Google Scholar
Espinoza D (2018) Avoiding numerical issues in optimization models. Accessed February 1, 2025, https://www.gurobi.com/events/avoiding-numerical-issues-in-optimization-models/.Google Scholar
Firat M, Crognier G, Gabor AF, Hurkens C, Zhang Y (2020) Column generation based heuristic for learning classification trees. Computers Oper. Res. 116(1):104866.Crossref, Google Scholar
Fischetti M, Leitner M, Ljubić I, Luipersbeck M, Monaci M, Resch M, Salvagnin D, et al. (2017) Thinning out Steiner trees: A node-based model for uniform edge costs. Math. Programming Comput. 9(2):203–229.Crossref, Google Scholar
Ford LR, Fulkerson DR (1962) Flows in Networks (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
Fu R, Huang Y, Singh PV (2020) Artificial intelligence and algorithmic bias: Source, detection, mitigation, and implications. INFORMS Tutorials in Operations Research, 39–63.Google Scholar
Günlük O, Kalagnanam J, Li M, Menickelly M, Scheinberg K (2021) Optimal decision trees for categorical data via integer programming. J. Global Optim. 81(1):233–260.Crossref, Google Scholar
Hochbaum DS (2018) Machine learning and data mining with combinatorial optimization algorithms. INFORMS Tutorials in Operations Research, 109–129.Google Scholar
Hooker JN, Ottosson G (2003) Logic-based benders decomposition. Math. Programming 96(1):33–60.Crossref, Google Scholar
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inform. Processing Lett. 5(1):15–17.Crossref, Google Scholar
Jans R, Desrosiers J (2013) Efficient symmetry breaking formulations for the job grouping problem. Comput. Oper. Res. 40(4):1132–1142.Crossref, Google Scholar
Jo N, Aghaei S, Benson J, Gómez A, Vayanos P (2022) Learning optimal fair classification trees. Proc. AAAI/ACM Conf. AI, Ethics, Society, AIES ’23 (ACM, New York), 181–192.Google Scholar
Kaplan H, Nussbaum Y (2011) Minimum s−t cut in undirected planar graphs when the source and the sink are close. Proc. Sympos. Theoretical Aspects Comput. Sci., vol. 9 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany), 117–128.Google Scholar
Kontschieder P, Fiterau M, Criminisi A, Bulò SR (2015) Deep neural decision forests. Proc. IEEE Internat. Conf. Comput. Vision (AAAI Press, Palo Alto, CA), 1467–1475.Google Scholar
Kumar A, Hanmandlu M, Gupta H (2013) Fuzzy binary decision tree for biometric based personal authentication. Neurocomputing 99:87–97.Crossref, Google Scholar
Land A, Doig A (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520.Crossref, Google Scholar
Lee G, Jaakkola TS (2020) Oblique decision trees from derivatives of ReLu networks. Preprint, submitted May 3, https://doi.org/10.48550/arXiv.1909.13488.Google Scholar
Lenstra HW (1987) Factoring integers with elliptic curves. Ann. Math. 126(3):649–673.Crossref, Google Scholar
Li Z, Wang L, Huang L, Zhang M, Cai X, Xu F, Wu F, et al. (2021) Efficient management strategy of COVID-19 patients based on cluster analysis and clinical decision tree classification. Sci. Rep. 11(1):9626.Crossref, Google Scholar
Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. Proc. 37th Internat. Conf. Machine Learn. (JMLR, New York), 6150–6160.Google Scholar
Magee JF (1964) Decision Trees for Decision Making (Harvard Business Review, Cambridge, MA).Google Scholar
Malik N, Singh PV (2019) Deep learning in computer vision: Methods, interpretation, causation, and fairness. INFORMS Tutorials in Operations Research, 73–100.Google Scholar
Margot F (2010) Symmetry in Integer Linear Programming (Springer, Berlin), 647–686.Crossref, Google Scholar
McTavish H, Zhong C, Achermann R, Karimalis I, Chen J, Rudin C, Seltzer M (2022) Fast sparse decision tree optimization via reference ensembles. Proc. AAAI Conf. Artificial Intelligence 36(9):9604–9613.Crossref, Google Scholar
Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. Machine Learning and Knowledge Discovery in Databases (Springer, Berlin), 453–469.Crossref, Google Scholar
Murthy SK (1998) Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining Knowledge Discovery 2(4):345–389.Crossref, Google Scholar
Murthy SK, Kasif S, Salzberg SL (1994) A system for induction of decision trees. J. Artificial Intelligence Res. 2(1):1–32.Crossref, Google Scholar
Narodytska N, Ignatiev A, Pereira F, Marques-Silva J (2018) Learning optimal decision trees with sat. Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1362–1368.Crossref, Google Scholar
Orsenigo C, Vercellis C (2003) Multivariate classification trees based on minimum features discrete support vector machines. IMA J. Management Math. 14(3):221–234.Google Scholar
Patel KK, Desaulniers G, Lodi A (2024) An improved column-generation-based matheuristic for learning classification trees. Comput. Oper. Res. 165(C):106579.Crossref, Google Scholar
Rl M, Mishra AK (2021) Measuring financial performance of Indian manufacturing firms: Application of decision tree algorithms. Measuring Bus. Excellence 26(3):288–307.Google Scholar
Valdivia A, Sánchez‐Monedero J, Casillas J (2021) How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness. Internat. J. Intelligent Systems 36(4):1619–1643.Crossref, Google Scholar
Verwer S, Zhang Y (2019) Learning optimal classification trees using a binary linear program formulation. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 1625–1632.Crossref, Google Scholar
Wang J, Fujimaki R, Motohashi Y (2015) Trading interpretability for accuracy: Oblique treed sparse additive models. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining. (ACM, New York), 1245–1254.Google Scholar
Wickramarachchi D, Robertson B, Reale M, Price C, Brown J (2016) HHCART: An oblique decision tree. Comput. Statist. Data Anal. (Oxford) 96(C):12–23.Crossref, Google Scholar
Yang Y, Morillo IG, Hospedales TM (2018) Deep neural decision trees. Preprint, submitted June 19, https://arxiv.org/abs/1806.06988.Google Scholar
Zantedeschi V, Kusner MJ, Niculae V (2020) Learning binary decision trees by argmin differentiation. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 12298–12309.Google Scholar
Zhang W, Ntoutsi E (2019) FAHT: An adaptive fairness-aware decision tree classifier. Kraus S, ed. Proc. 28th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1480–1486.Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:March 07, 2023
Accepted:December 21, 2025
Published Online:March 12, 2026

Cite as

Brandon C. Alston, Hamidreza Validi, Illya V. Hicks (2026) Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2023.0068

Keywords

Acknowledgments

The authors thank the anonymous reviewers for constructive feedback that improved this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News