Learning Symbolic Expressions: Mixed-Integer Formulations, Cuts, and Heuristics

Published Online:https://doi.org/10.1287/ijoc.2022.0050

In this paper, we consider the problem of learning a regression function without assuming its functional form. This problem is referred to as symbolic regression. An expression tree is typically used to represent a solution function, which is determined by assigning operators and operands to the nodes. Cozad and Sahinidis propose a nonconvex mixed-integer nonlinear program (MINLP), in which binary variables are used to assign operators and nonlinear expressions are used to propagate data values through nonlinear operators, such as square, square root, and exponential. We extend this formulation by adding new cuts that improve the solution of this challenging MINLP. We also propose a heuristic that iteratively builds an expression tree by solving a restricted MINLP. We perform computational experiments and compare our approach with a mixed-integer program–based method and a neural network–based method from the literature.

History: Accepted by Pascal Van Hentenryck, Area Editor for Computational Modeling: Methods & Analysis.

Funding: This work was supported by the Applied Mathematics activity within the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research [Grant DE-AC02-06CH11357].

Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0050) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2022.0050). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.