Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees
Abstract
Decision trees are powerful supervised machine learning tools for classification and regression that attract many academic researchers and industry professionals. In particular, decision trees provide interpretability, which is often preferred over other higher-accuracy methods that are relatively uninterpretable. An optimal binary classification tree has two types of vertices and is obtainable by solving a bi-objective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. In this paper, we propose two mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees: a maximum flow-based formulation and a minimum cut-based formulation, both of which share a common base model. The formulations enforce connected feasible paths for training datapoints through flow-based or cut-based connectivity constraints while maintaining computational tractability without the need for decomposition techniques commonly used to combat scaling concerns. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available data sets to demonstrate our models’ ability to scale, strength against traditional branch-and-bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.
History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete.
Funding: This research was supported by the Office of Naval Research [Grant N000142412648].
Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0068) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2023.0068). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

