This paper addresses the problem of tightening the mixed-integer linear programming (MILP) formulation for continuous piecewise linear (CPWL) approximations of data sets in arbitrary dimensions. The MILP formulation leverages the difference-of-convex (DC) representation of CPWL functions. We introduce the concept of well-behaved CPWL interpolations and demonstrate that any CPWL interpolation of a data set has a well-behaved version. This result is critical to tighten the MILP problem. We present six different strategies to tighten the problem, which include fixing the values of some variables, introducing additional constraints, identifying small big-M parameter values, and applying tighter variable bounds. These methods leverage key aspects of the DC representation and the inherent structure of well-behaved CPWL interpolations. Experimental results demonstrate that specific combinations of these tightening strategies lead to significant improvement in solution times, especially for tightening strategies that consider well-behaved CPWL solutions.

Funding: Financial support from the Department of Energy Office of Energy Efficiency and Renewable Energy [Contract DE-AC02-06CH11357] is gratefully acknowledged.

Supplemental Material: The electronic companion is available at https://doi.org/10.1287/ijoo.2025.0074.

1. Introduction

A continuous piecewise linear (CPWL) function is a continuous function defined on a compact set which can be partitioned into a finite set of affine domains within which the function is affine. CPWL functions are a fundamental tool in mathematics, engineering, and data science. This is due to their ability to approximate any continuous nonlinear function defined on a compact set to an arbitrary level of precision using a finite number of linear pieces (Huang 2020). CPWL functions are also an essential tool when fitting a data set for which the analytical expression of the nonlinear relationship is unknown (Rebennack and Krasko 2020). In optimization models, using CPWL functions to represent complex nonlinear relationships enables the use of mature mixed-integer linear programming (MILP) solvers, which are often significantly faster than nonlinear solvers (Vielma et al. 2010, Vielma 2015).

Not only can CPWL functions be represented in MILP models, but the optimal CPWL fitting problem itself can also be expressed as a MILP problem (Toriello and Vielma 2012). The objective function may vary depending on the application. For instance, given a fixed number of affine functions, one may wish to determine the CPWL approximation that minimizes an approximation error metric—for example, the maximum or average error (Rebennack and Krasko 2020). Alternatively, one may wish to identify the CPWL approximation with the minimal number of affine pieces (Kong and Maravelias 2020).

MILP formulations for the CPWL fitting problem have been extensively investigated for univariate functions and one-dimensional data sets. Frenzen et al. (2010) studied optimal partitioning of the range of the univariate function to be approximated and provided a quantitative result regarding how the number of segments depends on the function and the approximation error. Rebennack and Krasko (2020) developed a finitely convergent method to generate CPWL approximations for one-dimensional data sets or continuous univariate functions identifying the minimal number of breakpoints required to meet an acceptable approximation error. Similarly, Kong and Maravelias (2020) introduced a novel method of enforcing continuity at breakpoints through a set of linear constraints. The two formulations are compared in detail in Warwicker and Rebennack (2022), and it is shown that the formulation from Rebennack and Krasko (2020) is generally faster in practice. The work from Warwicker and Rebennack (2023) extends the work from Warwicker and Rebennack (2022) and uses a Benders decomposition approach to improve the solution time of the MILP problem. Ploussard (2024) introduces a tightening method that significantly shortens the MILP solution time of the CPWL fitting problem when hierarchically minimizing the number of linear pieces and approximation error. More recently, Warwicker and Rebennack (2025) presented an efficient approach for approximating continuous univariate functions with a CPWL function by embedding a linear-time algorithm into an adaptive discretization framework.

For bivariate functions and two-dimensional data sets, most existing methods constrain the affine domains of the CPWL approximation to be simplices. For example, D’Ambrosio et al. (2010) partition the CPWL function domain into rectangles that are later split into triangles along predefined diagonals. Toriello and Vielma (2012) use a similar method but select a set of splitting diagonals that minimize the approximation error. Rebennack and Kallrath (2015) use a different triangulation strategy that iteratively partitions the triangles covering the approximation domain into smaller ones until a target approximation error is reached. Duguet and Ngueveu (2022) propose a similar iterative refinement, but without imposing the affine domains to be simplices. Bärmann et al. (2023) introduce an approximation algorithm for the optimal piecewise linear interpolation of the continuous bivariate function xy over rectangular domains.

The literature addressing the optimal CPWL fitting problem for data sets of higher dimensions is significantly more limited (Misener and Floudas 2010, Rebennack and Kallrath 2015). Besides, most of the existing methods essentially extend the simplex-based partitioning strategy used for two dimensions, which result in computational challenges due to the exponential growth of the number of possible simplex-based partitions as the data dimension increases (Hughes and Anderson 1996). Kazda and Li (2021) introduced the first MILP formulation able to identify the optimal CPWL approximation of data sets of any dimension for a given maximum approximation error. They achieve this by leveraging the difference-of-convex (DC) representation of CPWL functions (Kripfganz and Schulze 1987), which allows for the affine domains partitioning the CPWL function to be implicitly defined by the difference of the pointwise maxima of two sets of affine functions. The strength of the DC representation lies in its ability to describe any CPWL function with affine domains of arbitrary shape. However, the computational cost of the MILP problem rapidly increases with the dimension of the data set, the number of data points, and the number of affine functions.

Deep learning techniques offer an alternative way to identify CPWL approximations (Huang 2020). In fact, the function represented by a neural network (NN) using ReLU activation function is a CPWL function, and the NN loss function represents the CPWL approximation error (DeVore et al. 2021). To the best of our knowledge, the only existing methods to identify CPWL approximations of data sets in multiple dimensions without explicitly specifying the partitioning of affine domains are the ReLU NN approach and the DC MILP approach introduced by Kazda and Li (2021). Training an NN to identify a good CPWL approximation is significantly faster than solving an MILP problem, contributing to NN growing popularity in recent years. However, the MILP approach provides three key advantages over the ReLU NN approach:

The NN approach uses a gradient-based method that identifies local optima, whereas the MILP approach is guaranteed to yield the CPWL function with the smallest approximation error.
Contrary to the MILP approach, the NN approach does not allow for the explicit enforcement of a target maximum approximation error.
Because of the complex relationship between the structure (width and depth) of an NN and the number of affine pieces of the CPWL functions they can model, it is difficult for NNs to identify good CPWL approximations with a manageable number of affine pieces (Chen et al. 2023). Conversely, the MILP approach allows the user to identify CPWL functions with a predefined, or minimal, number of linear pieces. This is a critical feature when these CPWL functions are subsequently embedded in MILP formulations to model nonlinear relationships, where the number of binary variables required to model CPWL functions typically increases with the number of affine pieces that compose them, making the MILP problem harder to solve. In other words, although an NN can quickly identify a good-quality CPWL approximation of a nonlinear relationship, the resulting approximation may render the MILP formulation in which it is embedded computationally intractable.

To address the high computational cost of solving the MILP problem, authors Kazda and Li (2024) introduce an iterative linear programming (LP) approach that identifies a valid CPWL approximation given a target maximum error. However, the identified CPWL solution is not necessarily optimal. A priori, identifying the global optimum of the CPWL fitting problem can only be achieved by solving an MILP problem. Therefore, it is critical to improve the MILP formulation of the CPWL fitting problem to reduce its computational cost. The method described in Ploussard (2024) achieves a tightening of the MILP problem that significantly improves the solution time, but the method only applies to one-dimensional data sets. This raises the need to develop MILP tightening methods that can be extended to data sets of any dimensions.

This paper aims to address this research gap. To achieve this, we introduce a critical class of CPWL interpolations called “well-behaved” and demonstrate that any CPWL interpolation has a well-behaved version. This important result is then leveraged to identify six tightening strategies that can be combined to significantly reduce the solution time of the MILP CPWL fitting problem.

The five main contributions of this paper are the following. First, we formalize the concept of well-behaved CPWL interpolations, a class of CPWL interpolations in which each affine piece interpolates a number of points greater than the dimension of the interpolated data set. Second, we demonstrate that any CPWL interpolation of a data set has a well-behaved version, which serves as a key theoretical foundation for our tightening method. Third, we leverage this result to introduce six tightening strategies for the MILP formulation of the CPWL fitting problem. Fourth, we analyze the theoretical impact of each tightening strategy together with the time complexity of the preprocessing step involved. Fifth, we assess the practical impact of multiple combinations of the tightening strategies. The proposed tightening strategies enable the identification of an optimal CPWL approximation in a relatively low computation time (i.e., up to 20 times faster than the existing literature).

The remainder of the article is organized as follows: Section 2 introduces the theoretical background of the CPWL-fitting problem. Section 3 describes the proposed MILP formulation. Section 4 formalizes the concept of well-behaved CPWL fitting and demonstrates the fact that any CPWL interpolation or approximation of a data set has a well-behaved version. Section 5 introduces the six tightening strategies of the MILP formulation. Section 6 introduces the case studies and assesses the performances of several combinations of tightening strategies. Section 7 presents the conclusions.

2. Background

Let $S = {(x_{i}, z_{i})}_{i = 1, \dots, N} = {(x_{i, 1}, \dots, x_{i, d}, z_{i})}_{i = 1, \dots, N}$ be a set of N points of $R^{d + 1}$ . We assume that the points ${(x_{i})}_{i = 1, \dots, N}$ are in general position in $R^{d}$ —that is, no subset of $d + 1$ points is contained within a single hyperplane (Edelsbrunner et al. 1986). Let $P_{R^{d}} : R^{d + 1} \to R^{d}$ be the projection defined as $P_{R^{d}} (x_{1}, \dots, x_{d}, z) = (x_{1}, \dots, x_{d})$ . Let $S_{R^{d}} = {(x_{i})}_{i = 1, \dots, N}$ be the projection of S by $P_{R^{d}}$ . Let $D = Conv (S_{R^{d}})$ be the convex hull of $S_{R^{d}}$ .

Definition 1.

We say that $f : D \to R$ is a CPWL function when f is continuous and there is a set of P affine functions ${f_{k} : D_{k} \to R, k = 1, \dots, P}$ such that the $D_{k}$ are convex polytopes, $\cup_{k = 1}^{P} D_{k} = D$ , and $f_{| D_{k}} = f_{k}, \forall k \in {1, \dots, P}$ . The $f_{k}$ are called the affine pieces of f, and the $D_{k}$ are called the affine domains of f.

Note that, contrary to the more common definition of CPWL functions from Geißler et al. (2012), we do not assume that the $D_{k}$ form a partition of D. The intersection of two affine domains is not required to have an empty interior. In addition, an affine domain can be empty, and two affine domains can be equal as long as the corresponding affine pieces are equal. This implies that the set of affine functions representing the CPWL function is not unique. However, apart from this, Definition 1 describes exactly the same mathematical object as the CPWL functions defined in the literature. The flexibility of our definition enables us to compare CPWL functions to one another, which will be useful in the following section.

Proposition 1.

Let $f : D \to R$ be a CPWL function. f can be expressed as the difference of two convex CPWL functions $f^{+}$ and $f^{-}$ . Specifically, there is a set of $P^{+}$ affine functions ${f_{j}^{+} : D \to R, j \in {1, \dots, P^{+}}}$ and $P^{-}$ affine functions ${f_{k}^{-} : D \to R, k \in {1, \dots, P^{-}}}$ such that:

f (x) = f^{+} (x) - f^{-} (x) = \max_{j \in {1, \dots, P^{+}}} f_{j}^{+} (x) - \max_{k \in {1, \dots, P^{-}}} f_{k}^{-} (x), \forall x \in D .

$f_{j}^{c}$ are called the affine pieces of $f^{c}$ , and $D_{j}^{c} = {x \in D : f^{c} (x) = f_{j}^{c} (x)}$ are called the affine domains of $f^{c}, j \in {1, \dots, P^{c}}, c \in {+, -}$ . In addition, ${D_{j}^{c} : j \in {1, \dots, P^{c}}}$ is a partition of D.

The proof of Proposition 1 is provided in Kripfganz and Schulze (1987). The above formulation is called the DC representation of f. Note that the DC representation of a CPWL function is not unique.

Proposition 2.

Let $f : D \to R$ be a CPWL function. Let $f = f^{+} - f^{-}$ be a DC representation of f. The affine domains of $f^{+}$ and $f^{-}$ are convex polytopes. In addition, any affine domain of f is the intersection of an affine domain of $f^{+}$ and $f^{-}$ . In other words, for any affine domain $D_{p}$ of f, there is an affine domain $D_{j}^{+}$ and $D_{k}^{-}$ of $f^{+}$ and $f^{-}$ such that $D_{p} = D_{j}^{+} \cap D_{k}^{-}$ , and $f (x) = f_{j}^{+} (x) - f_{k}^{-} (x), \forall x \in D_{p}$ .

The proof of Proposition 2 is given in Kripfganz and Schulze (1987) and stems from the DC representation of f.

Definition 2.

We say that $f : D \to R$ is an interpolation of, or interpolates, S when $f (x_{i}) = z_{i}, \forall i \in {1, \dots, N}$ (Davis 1975).

Definition 3.

We say that $f : D \to R$ is an ε-approximation of, or ε-approximates, S when there exists ${(e_{i})}_{i = 1, \dots, N} \in R^{N}$ such that $| e_{i} | \leq ε$ and f interpolates the set ${(x_{i}, z_{i} + e_{i})}_{i = 1, \dots, N}$ . In other words, f is an ε-approximation of S if $\max_{i = 1, \dots, N} | f (x_{i}) - z_{i} | \leq ε$ .

Definition 4.

Let $f : D \to ℝ$ and $g : D \to R$ be two ε-approximations of S. We say that f and g are equivalent with respect to S when $f (x_{i}) = g (x_{i}), \forall i \in {1, \dots, N}$ .

3. MILP Formulation

We aim to identify the optimal CPWL fitting of a set of points S. More specifically, we aim to identify the CPWL ε-approximation of S that minimizes an objective Q subject to a given error tolerance $ε$ . For example, Q can be the maximum fitting error, the average fitting error, or the number of affine pieces.

The following MILP formulation is largely based on the formulation presented in Kazda and Li (2021), where $f (x_{i})$ , $f^{+} (x_{i})$ , $f^{-} (x_{i})$ , $e_{i}$ , $a_{j}^{c}$ , and $b_{j}^{c}$ are continuous variables; $δ_{i, j}^{c}$ are binary variables; and $M_{i}^{c}$ are big-M parameters.

minimize Q,

(1)

s . t . f (x_{i}) = f^{+} (x_{i}) - f^{-} (x_{i}), i \in {1, \dots, N},

(2)

0 \leq f^{c} (x_{i}) - a_{j}^{c}^{T} x_{i} - b_{j}^{c}, i \in {1, \dots, N}, j \in {1, \dots, P^{c}}, c \in {+, -},

(3)

f^{c} (x_{i}) - a_{j}^{c}^{T} x_{i} - b_{j}^{c} \leq M_{i}^{c} (1 - δ_{i, j}^{c}), i \in {1, \dots, N}, j \in {1, \dots, P^{c}}, c \in {+, -},

(4)

\sum_{j = 1}^{P^{c}} δ_{i, j}^{c} \geq 1, i \in {1, \dots, N}, c \in {+, -},

(5)

- e_{i} \leq f (x_{i}) - z_{i} \leq e_{i}, i \in {1, \dots, N},

(6)

e_{i} \leq ε, i \in {1, \dots, N},

(7)

δ_{i, j}^{c} \in {0, 1}, i \in {1, \dots, N}, j \in {1, \dots, P^{c}}, c \in {+, -},

(8)

f (x_{i}), f^{+} (x_{i}), f^{-} (x_{i}) \in R, e_{i} \in R_{\geq 0}, i \in {1, \dots, N},

(9)

a_{j}^{c} \in R^{d}, b_{j}^{c} \in R, j \in {1, \dots, P^{c}}, c \in {+, -} .

(10)

Equation (1) represents the objective function to be minimized. As long as Q is a linear expression, Equations (1)–(10) formulate an MILP problem. For the rest of the paper, we will refer to this MILP problem as $MILP 1 (Q)$ . Equation (2) is the DC representation of the CPWL function, where $f (x_{i})$ , $f^{+} (x_{i})$ , and $f^{-} (x_{i})$ represent the value at $x_{i}$ of the functions f, $f^{+}$ , and $f^{-}$ . Equations (3) and (4) formulate the representation of each convex function $f^{c}$ as the maximum of a set of affine functions $f_{j}^{c}$ , where $a_{j}^{c}$ and $b_{j}^{c}$ are the linear coefficients and bias terms of each affine piece $f_{j}^{c}$ . The binary variable $δ_{i, j}^{c}$ is equal to one when the point $x_{i}$ belongs to the domain $D_{j}^{c}$ , and the big-M parameter $M_{i}^{c}$ ensures that $f^{c} (x_{i}) = f_{j}^{c} (x_{i})$ when $x_{i} \in D_{j}^{c}$ . Note that the big-M parameter $M_{i}^{c}$ in Equation (4) must be large enough so that it does not constrain the term $f^{c} (x_{i}) - a_{j}^{c}^{T} x_{i} - b_{j}^{c}$ when $δ_{i, j}^{c} = 0$ . In Equation (5), we will identify a tight value for $M_{i}^{c}$ . $P^{+}$ and $P^{-}$ are the numbers of affine pieces of $f^{+}$ and $f^{-}$ , respectively. Equation (5) ensures that each point $x_{i}$ belongs to at least one affine domain of $f^{+}$ and one affine domain of $f^{-}$ . Note that, contrary to the literature (e.g., Kazda and Li 2021), we use an inequality constraint instead of an equality constraint. By doing so, we allow the points $x_{i}$ to belong to multiple affine domains. This can occur if, for example, some affine pieces are identical or if some points are at the border between two or more affine domains. This inequality is also critical to ensure that some well-behaved CPWL solutions are feasible in $MILP 1 (Q)$ , which is a property that will be discussed in the next section. Equation (6) defines the approximation error $e_{i}$ at each point $x_{i}$ as the distance between $z_{i}$ and the value of the CPWL function at $x_{i}$ . Equation (7) ensures that this distance is never greater than the specified error tolerance $ε$ . Finally, Equations (8)–(10) define the domain of the continuous and binary variables.

Note that some optimization solvers like Gurobi (Gurobi Optimization LLC 2024a) make it possible to formulate Equation (4) without using a big-M parameter. Instead, Equation (4) can be replaced by the indicator constraint below (Gurobi Optimization LLC 2024b):

δ_{i, j}^{c} = 1 \Rightarrow f^{c} (x_{i}) - a_{j}^{c}^{T} x_{i} - b_{j}^{c} \leq 0, i \in {1, \dots, N}, j \in {1, \dots, P^{c}}, c \in {+, -} .

(11)

We note $MILPIC 1 (Q)$ the alternative MILP formulation using the indicator constraint, which is composed of Equations (1)–(3) and (5)–(11).

The objective can be set to minimize different error metrics. Formulating the objective with Equation (12) minimizes the average fitting error ( $L_{1}$ norm), the discrete analogue of minimizing the area between two functions. Alternatively, using Equation (13) minimizes the maximum fitting error ( $L_{\infty}$ norm), which provides a strict pointwise guarantee on the approximation quality.

Q = \frac{1}{N} \sum_{i = 1}^{N} e_{i},

(12)

Q \geq e_{i}, i \in 1, \dots, N .

(13)

Other objective functions may include the number of affine pieces of $f^{+}$ , $f^{-}$ , or f, as well as a combination of error metric and number of affine pieces. These alternative objective functions can be found in Electronic Companion Section EC.1.

4. Well-Behaved CPWL Interpolations

This section introduces a new class of CPWL interpolations called “well-behaved” CPWL interpolations. Loosely speaking, well-behaved CPWL interpolations are composed of affine pieces whose gradients are not excessively steep. This class of CPWL interpolations aligns with the intuitive representation of what the CPWL interpolation of a data set should look like, and they have desirable properties when modeled in $MILP 1 (Q)$ . This section formalizes the definition of a well-behaved CPWL interpolation and defines what constitutes a well-behaved version of a CPWL interpolation. We then demonstrate that any CPWL interpolation has a well-behaved version. This result is critical to the tightening procedure introduced in Section 5.

Proposition 3.

Let $S = {(x_{i}, z_{i})}_{i = 1, \dots, d + 1}$ be a set of $(d + 1)$ points of $R^{d + 1}$ , where the $x_{i}$ are in general position. There exists a unique linear interpolation of S. In other words, S can be interpolated by a unique affine function.

Proposition 3 is a fundamental result from linear algebra (Boyd and Vandenberghe 2004). The system of linear equations $a^{T} x_{i} + b = z_{i}, i \in {1, \dots, d + 1}$ , with variables $a \in R^{d}$ and $b \in R$ , is exactly determined and has a unique solution due to the general position of the $x_{i}$ . If the number of points is less than $d + 1$ , the system is underdetermined, and there is an infinite number of possible linear interpolations. If the number of points is greater than $d + 1$ , because the $x_{i}$ are in general position, the system becomes overdetermined, and there exists either one or no linear interpolation.

Definition 5.

Let $f : D \to R$ be a CPWL interpolation of S, with $| S | = N \geq d + 1$ . We say that an affine piece of f is underdetermined, exactly determined, or overdetermined with respect to S if it interpolates less than, exactly, or more than $d + 1$ points of S, respectively.

Definition 6.

We say that $f : D \to R$ is a well-behaved CPWL interpolation of S when f is a CPWL interpolation of S and each affine piece of f is exactly determined or overdetermined—that is, each affine piece interpolates at least $d + 1$ points of S. Additionally, let $f : D \to R$ be an $ε$ -approximation of S and $e_{i} = f (x_{i}) - z_{i}$ . We say that f is a well-behaved CPWL $ε$ -approximation of S when f is a well-behaved CPWL interpolation of ${(x_{i}, z_{i} + e_{i})}_{i = 1, \dots, N}$ .

Definition 7.

Let f and g be two CPWL interpolations of S composed of the set of affine pieces { $f_{k} : D_{k} \to R, k = 1, \dots, P$ } and { $g_{k} : D_{k}^{'} \to R, k = 1, \dots, P$ }, respectively. We say that g is a well-behaved version of the CPWL interpolation f with respect to S if g is a well-behaved CPWL interpolation of S and $D_{k} \cap S_{R^{d}} \subseteq D_{k}^{'} \cap S_{R^{d}}, \forall k \in {1, \dots, P}$ . Additionally, let f and g be two CPWL $ε$ -approximations of S and $e_{i} = f (x_{i}) - z_{i}$ . We say that g is a well-behaved version of the CPWL $ε$ -approximation f with respect to S if g is a well-behaved version of the CPWL interpolation f with respect to ${(x_{i}, z_{i} + e_{i})}_{i = 1, \dots, N}$ .

Remark 1.

If g is a well-behaved version of the CPWL $ε$ -approximation f with respect to S, then the $ε$ -approximations f and g are equivalent according to Definition 4.

In other words, a well-behaved version of a CPWL interpolation f of S is a transformation of f where each affine piece has been adjusted, or “tilted,” to interpolate at least $d + 1$ points of S, including the points it was originally interpolating (before being tilted). An illustration for $d = 1$ is provided in Figure 1. In the following sections, we show that well-behaved CPWL interpolations have interesting properties that can be used to tighten the MILP formulation. To the best of the authors’ knowledge, this is the first time this class of CPWL interpolations is introduced. We prove below that for any CPWL interpolation, a well-behaved version always exists.

Figure 1. Illustration of a CPWL Interpolation and Its Well-Behaved Version for $d = 1$
*Note.* Both CPWL functions interpolate the points and are composed of five affine pieces, but the affine pieces of the well-behaved interpolation have been tilted to interpolate at least two points per piece.

Remark 2.

If f is well-behaved, then f is a well-behaved version of itself.

Lemma 1.

Let S be a set of N points in $R^{d + 1}$ , with $N \geq d + 1$ . Let $f : D \to R$ be a CPWL interpolation of S. Let $f_{1} : D_{1} \to R$ be one of the affine pieces of f, and let $f_{2} : D_{2} \to R$ be a neighboring affine piece of $f_{1}$ —that is, an affine piece of f sharing a common domain boundary with $f_{1}$ . Let $\tilde{f_{1}}$ be the affine extension of $f_{1}$ to the domain D. Then,

\exists c \in {- 1, 1} : \forall x \in D_{2}, c (\tilde{f_{1}} (x) - f_{2} (x)) \leq 0 .

Proof of Lemma 1.

Let $x \in D_{2}$ . According to Proposition 2, a shared domain boundary between $f_{1}$ and $f_{2}$ is either a shared domain boundary between two affine pieces of $f^{+}$ or between two pieces of $f^{-}$ .

Case 1: The shared domain boundary is between two affine pieces of $f^{+}$ . Then, using Proposition 2, we can write $\tilde{f_{1}} (x) = f_{1}^{+} (x) - f_{0}^{-} (x)$ and $f_{2} (x) = f_{2}^{+} (x) - f_{0}^{-} (x)$ . Because $f_{2}^{+} (x) = \max_{j \in {1, \dots, P^{+}}} f_{j}^{+} (x) \geq f_{1}^{+} (x)$ , we therefore have $\tilde{f_{1}} (x) \leq f_{2} (x)$ . Thus $(\tilde{f_{1}} (x) - f_{2} (x)) \leq 0$ .

Case 2: The shared domain boundary is between two affine pieces of $f^{-}$ . We can write $\tilde{f_{1}} (x) = f_{0}^{+} (x) - f_{1}^{-} (x)$ and $f_{2} (x) = f_{0}^{+} (x) - f_{2}^{-} (x)$ . Because $f_{2}^{-} (x) \geq f_{1}^{-} (x)$ , we therefore have $\tilde{f_{1}} (x) \geq f_{2} (x)$ . Thus, $(- 1) (\tilde{f_{1}} (x) - f_{2} (x)) \leq 0$ . $□$

Proposition 4.

Let $A \in R^{m \times n}$ and $b \in R^{m}$ . Assume that $\min (n, m)$ rows of $A$ are linearly independent and the polyhedron $U = {x \in R^{n} : Ax \leq b}$ is nonempty. Then, there exists a point $x^{*} \in R^{n}$ such that $A_{i} x^{*} = b_{i}$ for $\min (n, m)$ rows $A_{i}$ of $A$ .

Proposition 4 is a well-established fact from LP theory, as described in Bertsimas and Tsitsiklis (1997). The $\min (n, m)$ constraints “ $A_{i} x^{*} = b_{i}$ ” are also known as the active constraints of the solution $x^{*}$ . The point $x^{*}$ lies on the boundary of U, and, if $m \geq n$ , $x^{*}$ is a vertex of U.

Lemma 2.

Let S be a set of N points in $R^{d + 1}$ , with $N \geq d + 1$ . Let $f : D \to R$ be a CPWL interpolation of S that is not well-behaved—that is, at least one of the affine pieces of f is underdetermined. Let $f_{0} : D_{0} \to R$ represent an underdetermined affine piece of f that interpolates n points of S ( $n \leq d$ ). Let $f_{j} : D_{j} \to R, j \in {1, \dots, J}$ be the J neighboring pieces of $f_{0}$ —that is, the affine pieces that share a domain boundary with $f_{0}$ . Let $\tilde{D_{0}} = \cup_{j = 0}^{J} D_{j}$ . Assume there is a total of $m \geq 0$ points in S whose projections lie in $\tilde{D_{0}} ∖ D_{0}$ (interpolated by the J neighboring pieces but not by $f_{0}$ ). Then, there exists an alternative CPWL interpolation of S in which $f_{0}$ is adjusted to interpolate $\min (d + 1, n + m)$ points of S without impacting the points interpolated by the neighboring pieces. In other words, we can construct a new CPWL interpolation of S defined over $\tilde{D_{0}}$ and represented by a set of affine functions ${f_{j}^{'} : D_{j}^{'} \to R, j \in {0, 1, \dots, J}}$ , such that: (1) The functions $f_{j}^{'}$ and $f_{j}$ are equal on their shared domain—that is, $f_{j}^{'} (x) = f_{j} (x), \forall x \in D_{j}^{'} \cap D_{j}, j \geq 1$ ; (2) the points that are interpolated by $f_{j}$ and $f_{j}^{'}$ are the same—that is, $S_{R^{d}} \cap D_{j}^{'} = S_{R^{d}} \cap D_{j}, j \geq 1$ ; and (3) the function $f_{0}^{'}$ interpolates $\min (d + 1, n + m)$ points of S, which include the n points interpolated by $f_{0}$ —that is, $| S_{R^{d}} \cap D_{0}^{'} | = \min (d + 1, n + m)$ and $S_{R^{d}} \cap D_{0} \subseteq S_{R^{d}} \cap D_{0}^{'}$ .

Proof of Lemma 2.

Let $a_{0}$ and $b_{0}$ be the linear coefficients and bias terms of $f_{0}$ —that is, $f_{0} (x) = a_{0}^{T} x + b_{0}, \forall x \in D_{0}$ . Let ${(x_{i}, z_{i})}_{i = 1, \dots, n}$ denote the n points of S whose projections on $R^{d}$ lie in $D_{0}$ and ${(x_{i}^{'}, z_{i}^{'})}_{i = 1, \dots, m}$ denote the m points of S whose projections on $R^{d}$ lie in $\tilde{D_{0}} ∖ D_{0}$ . Let $p (i)$ denote the smallest integer such that $x_{i}^{'} \in D_{p (i)}$ . According to Lemma 1, $\exists {(c_{j})}_{j = 1, \dots, J} \in {- 1, 1}^{J} : c_{p (i)} (\tilde{f_{0}} (x_{i}^{'}) - f_{p (i)} (x_{i}^{'})) \leq 0, \forall i \in {1, \dots, m}$ . Because $\tilde{f_{0}} (x_{i}^{'}) = a_{0}^{T} x_{i}^{'} + b_{0}$ and $f_{p (i)} (x_{i}^{'}) = z_{i}^{'}$ , this can be expressed as: $c_{p (i)} (a_{0}^{T} x_{i}^{'} + b_{0} - z_{i}^{'}) \leq 0, \forall i \in {1, \dots, m}$ . In other words, $(a_{0}, b_{0})$ is a point of the polyhedron U defined by:

\begin{array}{l} (a, b) \in R^{d + 1} : \\ a^{T} x_{i} + b - z_{i} = 0, & i \in {1, \dots, n} \\ c_{p (i)} (a^{T} x_{i}^{'} + b - z_{i}^{'}) \leq 0, & i \in {1, \dots, m} \end{array} .

U is not empty because $(a_{0}, b_{0}) \in U$ . Therefore, according to Proposition 4, there exists a solution $(a^{*}, b^{*}) \in U$ with $\min (d + 1, n + m)$ active constraints. Note that the constraints “ $a^{* T} x_{i} + b^{*} - z_{i} = 0$ ” are already active, which means that there must be $\min (d + 1, n + m) - n$ additional active constraints of type “ $c_{p (i)} (a^{* T} x_{i}^{'} + b^{*} - z_{i}^{'}) = 0$ .” We define $f_{0}^{'}$ as the affine function defined by $f_{0}^{'} (x) = a^{* T} x + b^{*}$ . We define the domain of $f_{0}^{'}$ as the domain delimited by the J boundaries ${x \in D : f_{0}^{'} (x) = \tilde{f_{j}} (x)}$ . We define $f_{j}^{'}$ as the J affine pieces $f_{j}$ for which the domains were updated according to the J boundaries of $f_{0}^{'}$ . The set of affine functions ${f_{j}^{'} : D_{j}^{'} \to R, j \in {0, 1, \dots, J}}$ constitutes a valid CPWL interpolation of S over $\tilde{D_{0}}$ , with $f_{0}^{'}$ interpolating $\min (d + 1, n + m)$ points of S, including the original n points interpolated by $f_{0}$ . The points of S interpolated by $f_{j}^{'}$ are the same as the points interpolated by $f_{j}$ , $\forall j \in {1, \dots, J}$ . $□$

Theorem 1.

Let S be a set of N points in $R^{d + 1}$ , where $N \geq d + 1$ . Any CPWL interpolation of S has a well-behaved version.

Proof of Theorem 1.

Let $f : D \to R$ be a CPWL interpolation of S. If every affine piece of f interpolates at least $d + 1$ points, f is already well-behaved. Otherwise, we construct a well-behaved version iteratively. The procedure begins by identifying an underdetermined piece, $f_{0}$ , with a neighboring domain containing at least one point not interpolated by $f_{0}$ ; the existence of such a piece is proven in Section EC.2 of the electronic companion. Using Lemma 2, we then adjust $f_{0}$ to interpolate this additional point without affecting its neighbors. Repeating this process guarantees termination, as each step incorporates a new point from the finite set S. The final CPWL interpolation is, by construction, a well-behaved version of f. Note that this constructive procedure serves as a theoretical proof of existence of the well-behaved version; a computational implementation is not required for the scope of this work. The process is illustrated in Figure 2. $□$

Figure 2. Illustration of the Process of Converting a CPWL Interpolation into Its Well-Behaved Version for $d = 1$
*Notes.* At the beginning of the process, piece 5 could not be selected because its neighboring piece (piece 4) does not interpolate any point. At the end of the process, pieces 4 and 5 have identical domain, which is allowed by our definition of CPWL functions.

Corollary 1.

Any CPWL $ε$ -approximation of S has a well-behaved version.

The proof of Corollary 1 directly follows from Theorem 1 and Definition 7.

Remark 3.

Note that the procedure of tilting the affine pieces of f to construct a well-behaved version may result in two or more underdetermined pieces merging together. In other words, initially distinct pieces may end up interpolating the same subset of points and having the same expression and affine domain after converting f to a well-behaved interpolation. In this case, the well-behaved version is still a valid CPWL function owing to the flexibility of Definition 1.

Remark 4.

A well-behaved version of a CPWL interpolation is not necessarily unique. That is, a CPWL interpolation may have several well-behaved versions. This stems from the fact that the point with $\min (n, m)$ active constraints in Proposition 4 is not necessarily unique.

An important application of this theorem is that we can significantly reduce the set of CPWL functions to be considered in $MILP 1 (Q)$ without impacting the feasibility of the problem. Specifically, if a CPWL $ε$ -approximation f of S exists, a well-behaved version g of f also exists. The two CPWL solutions f and g have the same number of affine pieces and are equivalent with respect to S—that is, $f (x_{i}) = g (x_{i}), \forall i \in {1, \dots, N}$ . Therefore, we can eliminate CPWL solutions that are not well-behaved from the feasible region of $MILP 1 (Q)$ because they are redundant solutions and may have excessively steep gradients due to their underdetermined affine pieces. In the next section, we will see how this theorem allows us to derive valid tight bounds and constraints for the MILP problem.

Definition 8.

Herein, we will refer to the set of CPWL $ε$ -approximations of S as $CPWL (S, ε)$ . The subset of $CPWL (S, ε)$ that can be represented using $P^{+}, P^{-},$ affine pieces for $f^{+}, f^{-}$ , respectively, is denoted $CPWL (S, ε, P^{+}, P^{-})$ . The set of well-behaved CPWL $ε$ -approximations of S will be denoted as $CPW L^{*} (S, ε)$ , and $CPW L^{*} (S, ε, P^{+}, P^{-})$ when represented by $P^{+}, P^{-}$ affine pieces. Note that $CPW L^{*} (S, ε) \subset CPWL (S, ε)$ . Additionally, note that the feasible region of the MILP problem $MILP 1 (Q)$ is given by $CPWL (S, ε, P^{+}, P^{-})$ .

5. Tightening the MILP Formulation

This section introduces six strategies to tighten and enhance the formulation of $MILP 1 (Q)$ . First, we demonstrate that we can fix one of the affine pieces of $f^{-}$ without affecting the set of CPWL solutions. Next, we impose an ordering on the affine pieces of $f^{+}$ and $f^{-}$ that does not affect the set of CPWL solutions. We can also impose that each affine piece of $f^{+}$ and $f^{-}$ contains at least $d + 1$ points, which serves as a valid tightening of the set of feasible well-behaved CPWL solutions. Alternatively, we may require that each affine piece of f contains at least $d + 1$ points, which requires the use of additional variables. Finally, we identify tight values for the big-M parameters and establish tight bounds for the linear variables.

Additional strategies that leverage the convexity of the functions $f^{+}$ and $f^{-}$ and of their affine domains are included in the electronic companion, Section EC.3. However, contrary to the six strategies presented here, their impact on the feasible region is unclear, and they do not seem to have a significant impact on the solution time.

Herein, it is assumed that $N \geq d + 1$ .

5.1. Fixing One Affine Piece

Lemma 3.

Let ${f_{k}, k \in {0, 1, \dots, K}}$ be a set of functions. We have $\max_{k \in {1, \dots, K}} (f_{k} - f_{0}) = \max_{k \in {1, \dots, K}} (f_{k}) - f_{0}$ .

Lemma 3 follows directly from the translation invariance of the max function—that is, $\max (a - c, b - c) = \max (a, b) - c, \forall a, b, c \in R$ .

Theorem 2.

The following equation constitutes a valid tightening constraint of $MILP 1 (Q)$ :

a_{1}^{-} = 0, b_{1}^{-} = 0 .

(14)

Proof of Theorem 2.

To prove this, we must demonstrate that any CPWL function f can be expressed as the difference of two convex CPWL functions $f = \max_{j \in {1, \dots, P^{+}}} g_{j}^{+} - \max_{j \in {1, \dots, P^{-}}} g_{j}^{-}$ , with $g_{1}^{-} = 0$ . According to Proposition 1, there is a set of affine functions ${f_{1}^{+}, \dots, f_{P^{+}}^{+}, f_{1}^{-}, \dots, f_{P^{-}}^{-}}$ such that f can be expressed as $f = \max_{j \in {1, \dots, P^{+}}} f_{j}^{+} - \max_{k \in {1, \dots, P^{-}}} f_{k}^{-}$ . Let $g_{j}^{c} = f_{j}^{c} - f_{1}^{-}, \forall j \in {1, \dots, P^{c}}, c \in {+, -}$ . The function $g_{j}^{c}$ is affine because it is the difference of two affine functions. Moreover, $g_{1}^{-} = f_{1}^{-} - f_{1}^{-} = 0$ . According to Lemma 3, we have $\max_{j \in {1, \dots, P^{c}}} (g_{j}^{c}) = \max_{j \in {1, \dots, P^{c}}} (f_{j}^{c}) - f_{1}^{-}$ . Therefore, we can rewrite f as $f = \max_{j \in {1, \dots, P^{+}}} (f_{j}^{+}) - \max_{k \in {1, \dots, P^{-}}} (f_{k}^{-}) + f_{1}^{-} - f_{1}^{-} = \max_{j \in {1, \dots, P^{+}}} (g_{j}^{+}) - \max_{k \in {1, \dots, P^{-}}} (g_{k}^{-})$ . $□$

Note that Equation (14) affects not only the set of possible values for $a_{1}^{-}, b_{1}^{-}$ , but also for all $a_{j}^{c}, b_{j}^{c}$ . Also note that the fixed values of zero for $a_{1}^{-}, b_{1}^{-}$ are chosen arbitrarily; any value could be imposed and serve as a valid tightening constraint for the problem. Imposing Equation (14) effectively reduces the dimensionality of the search space by $(d + 1)$ . Formally, Equation (14) does not introduce any additional linear constraints; instead, it introduces tight bounds on $(d + 1)$ linear variables.

5.2. Sorting the Affine Pieces

Let $a_{j}^{c} = (\begin{matrix} a_{j, 1}^{c} \\ ⋮ \\ a_{j, d}^{c} \end{matrix}) \in R^{d}$ denote the linear coefficients of the affine piece $f_{j}^{c}$ .

Theorem 3.

The following equation constitutes a valid tightening constraint of $MILP 1 (Q)$ :

a_{j, 1}^{c} \leq a_{j + 1, 1}^{c}, j \in {1, \dots, P^{c} - 1}, c \in {+, -} .

(15)

Proof of Theorem 3.

The problem $MILP 1 (Q)$ exhibits symmetries with respect to the group of variables ${(a_{j}^{+}, b_{j}^{+})}_{j = 1, \dots, P^{+}}$ and ${(a_{k}^{-}, b_{k}^{-})}_{j = 1, \dots, P^{-}}$ , respectively. This means that permuting the affine pieces of $f^{+}$ , or $f^{-}$ , does not affect the solution of the problem. Therefore, we can arbitrarily sort the affine pieces of $f^{+}$ and $f^{-}$ based on the ascending order of the values ${(a_{j, 1}^{+})}_{j = 1, . ., P^{+}}$ and ${(a_{k, 1}^{-})}_{k = 1, . ., P^{-}}$ , respectively. $□$

It is important to note that, after sorting the affine pieces of $f^{+}$ and $f^{-}$ , the procedure outlined in the proof of Theorem 2 can be applied to impose Equation (14) with no impact on the CPWL solution. Thus, Equations (14) and (15) are compatible and can be applied simultaneously with no impact on the CPWL solution. The number of possible arrangements for $P^{c}$ affine pieces is $P^{c}!$ , which implies that implementing Equation (15) reduces the feasible region by a factor $P^{+}! P^{-}!$ of its original extent. Additionally, applying Equation (15) introduces $P^{+} + P^{-} - 2$ constraints to the MILP problem.

5.3. Imposing $d + 1$ Points per Affine Domain of $f^{c}$

Theorem 4.

Adding the following equation to $MILP 1 (Q)$ reduces the feasible region to a superset of $CPW L^{*} (S, ε, P^{+}, P^{-})$ . In other words, some non-well-behaved solutions may be eliminated from the new feasible region, but all well-behaved solutions are preserved.

\sum_{i = 1}^{N} δ_{i, j}^{c} \geq d + 1, j \in {1, \dots, P^{c}}, c \in {+, -} .

(16)

Proof of Theorem 4.

Let $f = f^{+} - f^{-}$ be a well-behaved CPWL solution. Let $D_{j}^{+}$ be one of the affine domains of $f^{+}$ . According to Proposition 1, $Int (D_{j}^{+}) \neq Ø$ and $D = \cup_{k = 1}^{P^{-}} D_{k}^{-}$ . Therefore, there exists a k such that $Int (D_{j}^{+} \cap D_{k}^{-}) \neq Ø$ . According to Proposition 2, $D_{p} = D_{j}^{+} \cap D_{k}^{-}$ is an affine domain of f. According to Definition 6, $| D_{p} \cap S_{R^{d}} | \geq d + 1$ . Furthermore, $D_{p} \subset D_{j}^{+}$ implies $D_{p} \cap S_{R^{d}} \subset D_{j}^{+} \cap S_{R^{d}}$ . Therefore, we have $| D_{j}^{+} \cap S_{R^{d}} | \geq | D_{p} \cap S_{R^{d}} | \geq d + 1$ . Similarly, for an affine domain $D_{k}^{-}$ of $f^{-}$ , we obtain $| D_{k}^{-} \cap S_{R^{d}} | \geq d + 1$ . In other words, each affine domain of $f^{+}$ and $f^{-}$ contains at least $d + 1$ points. $□$

Note that the inequality from Equation (5), which allows each point to belong to more than one affine domain of $f^{+}$ or $f^{-}$ , is critical for utilizing Equation (16). If we were instead imposing a single affine domain per point, as in the formulation from Kazda and Li (2021), Equation (16) would become too restrictive. In fact, the combined requirement of having only one affine domain per point and at least $(d + 1)$ points per affine domain would imply that $N \geq \max (P^{+}, P^{-}) (d + 1)$ , making the MILP problem infeasible if that condition was not satisfied. Using Equation (16) eliminates some non-well-behaved CPWL solutions from the feasible region. However, all well-behaved solutions are preserved. Owing to Theorem 1, it is proven that this does not impact the quality of the feasible CPWL solutions, as any CPWL $ε$ -approximation of S has a well-behaved version. Implementing Equation (16) adds $P^{+} + P^{-}$ constraints to the MILP problem.

5.4. Imposing $d + 1$ Points per Affine Domain of f

Alternatively, we can impose a set of constraints that further reduces the feasible region to the set of well-behaved CPWL solutions. This approach requires the introduction of two additional sets of variables, $β_{i, j, k}$ and $γ_{j, k}$ . In the following equations, the indices $i \in {1, \dots, N}$ , $j \in {1, \dots, P^{+}}$ , $k \in {1, \dots, P^{-}}$ , unless otherwise specified.

Theorem 5.

Adding the following equations to $MILP 1 (Q)$ reduces the feasible region to $CPW L^{*} (S, ε, P^{+}, P^{-})$ :

β_{i, j, k} \leq δ_{i, j}^{+},

(17)

β_{i, j, k} \leq δ_{i, k}^{-},

(18)

β_{i, j, k} \geq δ_{i, j}^{+} + δ_{i, k}^{-} - 1,

(19)

β_{i, j, k} \leq γ_{j, k},

(20)

\sum_{i = 1}^{N} β_{i, j, k} \geq (d + 1) γ_{j, k},

(21)

0 \leq β_{i, j, k} \leq 1,

(22)

0 \leq γ_{j, k} \leq 1 .

(23)

Proof of Theorem 5.

Equations (17)–(19) and (22) define $β_{i, j, k}$ as an indicator variable which is equal to one if $x_{i} \in D_{j}^{+} \cap D_{k}^{-}$ and zero otherwise—that is, $β_{i, j, k} = δ_{i, j}^{+} \land δ_{i, k}^{-}$ . Equations (20) and (23) define $γ_{j, k}$ as an indicator variable which is equal to one if $D_{j}^{+} \cap D_{k}^{-}$ contains at least one point—that is, $⋁_{i = 1}^{N} β_{i, j, k} = 1 \Rightarrow γ_{j, k} = 1$ . Finally, Equation (21) formulates the condition that if one point is in $D_{j}^{+} \cap D_{k}^{-}$ , then $D_{j}^{+} \cap D_{k}^{-}$ must contain at least $d + 1$ points—that is, $γ_{j, k} = 1 \Rightarrow \sum_{i = 1}^{N} β_{i, j, k} \geq (d + 1)$ . Formally, $β_{i, j, k}$ and $γ_{j, k}$ should be defined as binary variables. However, the tight constraints from Equations (17)–(21) involving the binary variables $δ_{i, j}^{c}$ ensure that $β_{i, j, k}$ and $γ_{j, k}$ can only take values in ${0, 1}$ . $□$

Equations (17)–(23) involve the use of $P^{+} P^{-} (N + 1)$ additional variables, $P^{+} P^{-} (N + 1)$ variable bounds, and $P^{+} P^{-} (4 N + 1)$ additional constraints. These constraints restrict the feasible space to be $CPW L^{*} (S, ε, P^{+}, P^{-})$ . Note that the indicator variable $γ_{j, k}$ is necessary in Equation (21) because the intersection of an affine domain of $f^{+}$ and $f^{-}$ may be empty.

Remark 5.

Equation (16) is implied by Equations (17)–(23), meaning that the latter dominate the former. In other words, imposing $d + 1$ points per affine domain of f implies, and is more restrictive than, imposing $d + 1$ points per affine domain of $f^{c}$ .

5.5. Tightening the Big-M Parameters

In order to identify tight big-M values, we first need to define several sets. Let A denote the set of all affine functions $D \to R$ . Let ${[S]}^{d + 1}$ denote the set of all subsets of S composed of $d + 1$ points. Let $s = {(x_{i_{k}}, z_{i_{k}})}_{k = 1, \dots, d + 1} \in {[S]}^{d + 1}$ . We define the set $B (s, ε) = {{(x_{i_{k}}, z_{i_{k}} + e_{k})}_{k = 1, \dots, d + 1} : {(e_{k})}_{k = 1, \dots, d + 1} \in {- ε, + ε}^{d + 1}}$ . We define the set $A_{ε, s} = {g \in A : g$ ε-approximates $s}$ and $A_{ε, s}^{*} = {g \in A : g$ interpolates $s^{'} \in B (s, ε)}$ . We also note $A_{ε} (S) = \cup_{s \in {[S]}^{d + 1}} A_{ε, s}$ and $A_{ε}^{*} (S) = \cup_{s \in {[S]}^{d + 1}} A_{ε, s}^{*}$ . For convenience, let $\neg c = {\begin{array}{l} “ - ” & if c = “ + ” \\ “ + ” & if c = “ - ” \end{array}$ . Finally, given a CPWL solution $f = f^{+} - f^{-}$ , we define $f_{j, k} = f_{j}^{+} - f_{k}^{-}$ .

Proposition 5.

The set $A_{ε}^{*} (S)$ contains at most $(\begin{matrix} N \\ d + 1 \end{matrix}) 2^{d + 1}$ affine functions.

Proof of Proposition 5.

First, we have $| {[S]}^{d + 1} | = (\begin{matrix} N \\ d + 1 \end{matrix})$ . Additionally, $| A_{ε, s}^{*} | = | B (s, ε) | = | {- ε, + ε}^{d + 1} | = 2^{d + 1}$ . Therefore, $| A_{ε}^{*} (S) | = | \cup_{s \in {[S]}^{d + 1}} A_{ε, s}^{*} | \leq \sum_{s \in {[S]}^{d + 1}} | A_{ε, s}^{*} | = (\begin{matrix} N \\ d + 1 \end{matrix}) 2^{d + 1}$ . $□$

Lemma 4.

Let $f \in CPW L^{*} (S, ε, P^{+}, P^{-})$ , $x \in D$ . Let $(j, k) \in {1, \dots, P^{+}} \times {1, \dots, P^{-}} : Int (D_{j}^{+} \cap D_{k}^{-}) \neq Ø$ . Then, we have:

\min_{g \in A_{ε}^{*} (S)} g (x) \leq f_{j, k} (x) \leq \max_{g \in A_{ε}^{*} (S)} g (x) .

Proof of Lemma 4.

Because f is well-behaved, $f_{j, k}$ is an affine piece of f that $ε$ -approximates a subset $s \in {[S]}^{d + 1}$ . Therefore, $f_{j, k} \in A_{ε, s} \subset A_{ε} (S)$ . As a result, $\min_{g \in A_{ε} (S)} g (x) \leq f_{j, k} (x) \leq \max_{g \in A_{ε} (S)} g (x)$ . What remains to be proven is that the extrema of $g (x)$ on $A_{ε} (S)$ can be found on $A_{ε}^{*} (S)$ . Note that $\max_{g \in A_{ε, s}} g (x)$ is equal to the optimal objective value of the LP optimization problem $L P_{max} (s, x, ε)$ :

\max g (x), s . t . g \in A, - ε \leq g (x_{i_{k}}) - z_{i_{k}} \leq ε, (x_{i_{k}}, z_{i_{k}}) \in s .

Similarly, $\min_{g \in A_{ε, s}} g (x)$ is equal to the optimal objective value of $L P_{min} (s, x, ε)$ . In addition, $A_{ε, s}$ corresponds to the feasible region of $L P_{min} (s, x, ε)$ and $L P_{max} (s, x, ε)$ , whereas $A_{ε, s}^{*}$ corresponds to the vertices—that is, extreme points—of $A_{ε, s}$ , where the optimal solution is located. Consequently, $\max_{g \in A_{ε} (S)} g (x) = \max_{s \in {[S]}^{d + 1}} (\max_{g \in A_{ε, s}} (g (x))) = \max_{s \in {[S]}^{d + 1}} (\max_{g \in A_{ε, s}^{*}} (g (x))) = \max_{g \in A_{ε}^{*} (S)} g (x)$ . Similarly, $\min_{g \in A_{ε} (S)} g (x) = \min_{g \in A_{ε}^{*} (S)} g (x)$ . $□$

Lemma 5.

Let $f \in CPW L^{*} (S, ε, P^{+}, P^{-})$ , $x \in D$ , $c \in {+, -}$ , $(j, k) \in {1, \dots, P^{c}}^{2}$ . Then,

| f_{j}^{c} (x) - f_{k}^{c} (x) | \leq \min (P^{c} - 1, P^{\neg c}) (\max_{g \in A_{ε}^{*} (S)} g (x) - \min_{g \in A_{ε}^{*} (S)} g (x)) .

Proof of Lemma 5.

We prove the case $c = “ + ”$ , the case $c = “ - ”$ being symmetric. Let $x_{j} \in D_{j}^{+}$ , $x_{k} \in D_{k}^{+}$ . By convexity of D, there exists a line segment L connecting $x_{j}$ to $x_{k}$ that is contained in D. L crosses T affine domains of f, each of which is the intersection of an affine domain of $f^{+}$ and $f^{-}$ . Let ${(j_{m}, k_{m})}_{m = 1, \dots, T}$ be such that L crosses the nonempty domains $D_{j_{m}}^{+} \cap D_{k_{m}}^{-}$ in ascending order of m when traversing from $x_{j}$ to $x_{k}$ . An example is depicted in Figure 3 for $d = 2$ . We have $j_{1} = j$ and $j_{T} = k$ . Without loss of generality, we assume that at each step m, the line segment L transitions to either a new affine domain of $f^{+}$ or a new affine domain of $f^{-}$ , but not both. In other words, $(j_{m} \neq j_{m + 1}) \oplus (k_{m} \neq k_{m + 1}), \forall m = 1, \dots, T - 1$ . Because the intersection of L with an affine domain is convex, ${m : j_{m} = j}$ and ${m : k_{m} = k}$ are sets of consecutive numbers—that is, without gaps. Consequently, $j_{m}$ and $k_{m}$ can change value at most $P^{+} - 1$ and $P^{-} - 1$ times, respectively. That is, $| {m : j_{m} \neq j_{m + 1}} | \leq P^{+} - 1$ . Thus, we have:

\begin{array}{l} f_{j}^{+} (x) - f_{k}^{+} (x) = f_{j_{1}}^{+} (x) - f_{j_{T}}^{+} (x) \underset{(a)}{=} \sum_{m = 1}^{T - 1} (f_{j_{m}}^{+} (x) - f_{j_{m + 1}}^{+} (x)) \underset{(b)}{=} \sum_{m : j_{m} \neq j_{m + 1}} (f_{j_{m}}^{+} (x) - f_{j_{m + 1}}^{+} (x)) \\ \underset{(c)}{=} \sum_{m : j_{m} \neq j_{m + 1}} (f_{j_{m}}^{+} (x) - f_{k_{m}}^{-} (x) + f_{k_{m + 1}}^{-} (x) + f_{j_{m + 1}}^{+} (x)) = \sum_{m : j_{m} \neq j_{m + 1}} (f_{j_{m}, k_{m}} (x) - f_{j_{m + 1}, k_{m + 1}} (x)) \\ \underset{(d)}{\leq} \sum_{m : j_{m} \neq j_{m + 1}} (\max_{g \in A_{ε}^{*} (S)} g (x) - \min_{g \in A_{ε}^{*} (S)} g (x)) \underset{(e)}{\leq} (P^{+} - 1) (\max_{g \in A_{ε}^{*} (S)} g (x) - \min_{g \in A_{ε}^{*} (S)} g (x)) . \end{array}

(a)

uses a telescoping sum,

(b)

is due to the fact that

f_{j_{m}}^{+} (x) = f_{j_{m + 1}}^{+} (x)

when

j_{m} = j_{m + 1}

(c)

derives from

(j_{m} \neq j_{m + 1}) \Rightarrow (k_{m} = k_{m + 1})

(d)

is a result of Lemma 4, and

(e)

stems from

| {m : j_{m} \neq j_{m + 1}} | \leq P^{+} - 1

. We note K the set of unique values of

{k_{m}, m = 1, \dots, M}

\underline{m} (p) = \min_{k_{m} = p} (m)

, and

\bar{m} (p) = \max_{k_{m} = p} (m)

. We have:

\begin{array}{l} f_{j}^{+} (x) - f_{k}^{+} (x) = \sum_{p \in K} (\sum_{\overset{m : j_{m} \neq j_{m + 1},}{k_{m} = k_{m + 1} = p}} (f_{j_{m}, k_{m}} (x) - f_{j_{m + 1}, k_{m + 1}} (x))) \\ \underset{(f)}{=} \sum_{p \in K} (f_{j_{\underline{m} (p)}, k_{\underline{m} (p)}} (x) - f_{j_{\bar{m} (p)}, k_{\bar{m} (p)}} (x)) \underset{(g)}{\leq} P^{-} (\max_{g \in A_{ε}^{*} (S)} g (x) - \min_{g \in A_{ε}^{*} (S)} g (x)) . \end{array}

(f)

uses a telescoping sum on

{\underline{m} (p), \dots, \bar{m} (p)}

, and

(g)

is due to Lemma 4 and the fact that

| K | \leq P^{-}

. As a result,

f_{j}^{+} (x) - f_{k}^{+} (x) \leq \min (P^{+} - 1, P^{-}) (\max_{g \in A_{ε}^{*} (S)} g (x) - \min_{g \in A_{ε}^{*} (S)} g (x)) .

Similarly, the lower bound of $f_{j}^{+} (x) - f_{k}^{+} (x)$ is proven by flipping the inequality in in $(d)$ and $(g)$ . $□$

Figure 3. Example of a Line Connecting Two Points from Distinct Affine Domains of $f^{+}$ for $d = 2$
*Notes.* Here, both $f^{+}$ and $f^{-}$ have four affine domains, and the line connects a point in $D_{1}^{+}$ to a point in $D_{4}^{+}$ . The line crosses five pairs of affine domains in $f^{+}$ and $f^{-}$ : $((1, 2), (2, 2), (2, 3), (3, 3), (4, 3))$ . We have $f_{1}^{+} (x) - f_{4}^{+} (x) = (f_{1, 2} (x) - f_{2, 2} (x)) + (f_{2, 3} (x) - f_{4, 3} (x))$ .

Theorem 6.

Assuming we only consider well-behaved CPWL solutions, the following value is a tight value for the parameter $M_{n}^{c}$ in Equation (4):

M_{n}^{c} = \min (P^{c} - 1, P^{\neg c}) (\max_{g \in A_{ε}^{*} (S)} g (x_{n}) - \min_{g \in A_{ε}^{*} (S)} g (x_{n})) .

(24)

Proof of Theorem 6.

Let $n \in {1, \dots, N}$ , $c \in {+, -}$ , $j \in {1, \dots, P^{c}}$ . Let k be such that $x_{n} \in D_{k}^{c}$ —that is, $f^{c} (x_{n}) = f_{k}^{c} (x_{n})$ . Owing to Lemma 5, we have:

f^{c} (x_{n}) - f_{j}^{c} (x_{n}) = f_{k}^{c} (x_{n}) - f_{j}^{c} (x_{n}) \leq \min (P^{c} - 1, P^{\neg c}) (\max_{g \in A_{ε}^{*} (S)} g (x_{n}) - \min_{g \in A_{ε}^{*} (S)} g (x_{n})) . □

Herein, it is assumed that the value of the parameter $M_{n}^{c}$ from Equation (4) is the one described in Theorem 6. Note that the big-M values defined in Equation (24) are only valid when considering the solution space of well-behaved CPWL solutions. This implies that using these values may eliminate some CPWL solutions that are not well-behaved from the feasible region, although not necessarily all of them.

5.6. Bounding the Variables

We define the following parameters:

\underline{a_{r}} = \min_{g \in A_{ε}^{*} (S)} \frac{\partial g}{\partial x_{r}}, \bar{a_{r}} = \max_{g \in A_{ε}^{*} (S)} \frac{\partial g}{\partial x_{r}},

(25)

\underline{b} = \min_{g \in A_{ε}^{*} (S)} g (0), \bar{b} = \max_{g \in A_{ε}^{*} (S)} g (0),

(26)

\bar{a_{r}^{'}} = \min (P^{-} - 1, P^{+}) (\bar{a_{r}} - \underline{a_{r}}),

(27)

\bar{b^{'}} = \min (P^{-} - 1, P^{+}) (\bar{b} - \underline{b}) .

(28)

Lemma 6.

Let $f \in CPW L^{*} (S, ε, P^{+}, P^{-})$ . Let $(j, k) \in {1, \dots, P^{+}} \times {1, \dots, P^{-}} : Int (D_{j}^{+} \cap D_{k}^{-}) \neq Ø$ . Then, we have:

\begin{array}{l} \underline{a_{r}} \leq a_{j, r}^{+} - a_{k, r}^{-} \leq \bar{a_{r}}, & r \in {1, \dots, d} \\ \underline{b} \leq b_{j}^{+} - b_{k}^{-} \leq \bar{b} \end{array} .

Proof of Lemma 6.

The partial derivatives of an affine function are equal to its linear coefficients, and its value in $0$ corresponds to its bias term. In particular, $\frac{\partial f_{j, k}}{\partial x_{r}} = a_{j, r}^{+} - a_{k, r}^{-}$ and $f_{j, k} (0) = b_{j}^{+} - b_{k}^{-}$ . Because $f_{j, k} \in A_{ε} (S)$ , we have $\min_{g \in A_{ε} (S)} \frac{\partial g}{\partial x_{r}} \leq \frac{\partial f_{j, k}}{\partial x_{r}} \leq \max_{g \in A_{ε} (S)} \frac{\partial g}{\partial x_{r}}$ and $\min_{g \in A_{ε} (S)} g (0) \leq f_{j, k} (0) \leq \max_{g \in A_{ε} (S)} g (0)$ . Similar to Lemma 5, we can show that the extrema of $\frac{\partial g}{\partial x_{r}}$ and $g (0)$ on $A_{ε} (S)$ can be found on $A_{ε}^{*} (S)$ . Let $s \in {[S]}^{d + 1}$ . $\max_{g \in A_{ε, s}} \frac{\partial g}{\partial x_{r}}$ , $- \min_{g \in A_{ε, s}} \frac{\partial g}{\partial x_{r}}$ , $\max_{g \in A_{ε, s}} g (0)$ , $- \min_{g \in A_{ε, s}} g (0)$ are the optimal objective values of the following LP optimization problem, with $Q = a_{r}$ , $Q = - a_{r}$ , $Q = b$ , $Q = - b$ , respectively.

\max Q, s . t . - ε \leq \sum_{r = 1}^{d} a_{r} x_{i_{k}, r} + b - z_{i_{k}} \leq ε, (x_{i_{k}}, z_{i_{k}}) \in s .

Therefore, the extrema of $\frac{\partial g}{\partial x_{r}}$ and $g (0)$ on $A_{ε, s}$ can be found on the vertices $A_{ε, s}^{*}$ . It follows that the extrema on $A_{ε} (S)$ can be found on $A_{ε}^{*} (S)$ . $□$

Lemma 7.

Let $f \in CPW L^{*} (S, ε, P^{+}, P^{-})$ , $(j, k) \in {1, \dots, P^{-}}^{2}$ . Then,

\begin{array}{l} | a_{j, r}^{-} - a_{k, r}^{-} | \leq \bar{a_{r}^{'}}, & \forall r \in {1, \dots, d} \\ | b_{j}^{-} - b_{k}^{-} | \leq \bar{b^{'}} \end{array} .

Proof of Lemma 7.

This is demonstrated by using the same sum decomposition as in Lemma 5 but for $a_{j, r}^{-}$ , $b_{j}^{-}$ instead of $f_{j}^{+} (x)$ . The result follows from applying Lemma 6 to identify a similar inequality as $(d)$ and $(g)$ . $□$

Theorem 7.

For well-behaved CPWL solutions, the following equations represent valid upper and lower bounds of the linear variables:

z_{i} - ε \leq f (x_{i}) \leq z_{i} + ε, i \in {1, \dots, N},

(29)

0 \leq f^{-} (x_{i}) \leq M_{i}^{-}, i \in {1, \dots, N},

(30)

z_{i} - ε \leq f^{+} (x_{i}) \leq z_{i} + ε + M_{i}^{-}, i \in {1, \dots, N},

(31)

- \bar{a_{r}^{'}} \leq a_{j, r}^{-} \leq \bar{a_{r}^{'}}, j \in {1, \dots, P^{-}}, r \in {2, \dots, d},

(32)

0 \leq a_{j, 1}^{-} \leq \bar{a_{1}^{'}}, j \in {1, \dots, P^{-}},

(33)

\underline{a_{r}} - \bar{a_{r}^{'}} \leq a_{j, r}^{+} \leq \bar{a_{r}} + \bar{a_{r}^{'}}, j \in {1, \dots, P^{+}}, r \in {2, \dots, d},

(34)

\underline{a_{1}} \leq a_{j, 1}^{+} \leq \bar{a_{1}} + \bar{a_{1}^{'}}, j \in {1, \dots, P^{+}},

(35)

- \bar{b^{'}} \leq b_{j}^{-} \leq \bar{b^{'}}, j \in {1, \dots, P^{-}},

(36)

\underline{b} - \bar{b^{'}} \leq b_{j}^{+} \leq \bar{b} + \bar{b^{'}}, j \in {1, \dots, P^{+}} .

(37)

Proof of Theorem 7.

The bounds from Equation (29) derive from Equations (6) and (7). Equation (30) is based on Equation (14), using $f^{-} (x_{i}) = \max (f_{1}^{-} (x_{i}), f_{2}^{-} (x_{i}), \dots, f_{P^{-}}^{-} (x_{i})) = \max (0, f_{2}^{-} (x_{i}), \dots, f_{P^{-}}^{-} (x_{i})) \geq 0$ . In addition, $\exists k \in {1, \dots, P^{-}} : x_{i} \in D_{k}^{-}$ . Thus, according to Lemma 5, $f^{-} (x_{i}) = f_{k}^{-} (x_{i}) = f_{k}^{-} (x_{i}) - f_{1}^{-} (x_{i}) \leq M_{i}^{-}$ . Equation (31) derives from Equations (29) and (30) combined with $f^{+} (x_{i}) = f (x_{i}) + f^{-} (x_{i})$ . Using Equation (14) and Lemma 7, $| a_{j, r}^{-} | = | a_{j, r}^{-} - a_{1, r}^{-} | \leq \bar{a_{r}^{'}}$ , leading to Equation (32). Equation (33) arises from Equation (15): $a_{j, 1}^{-} \geq \dots \geq a_{1, 1}^{-} \geq 0$ . Next, combining Equation (32), Lemma 6, and k such that $Int (D_{j}^{+} \cap D_{k}^{-}) \neq Ø$ , we deduce: $a_{j, r}^{+} = a_{j, r}^{+} - a_{k, r}^{-} + a_{k, r}^{-}$ and $\underline{a_{r}} - \bar{a_{r}^{'}} \leq a_{j, r}^{+} - a_{k, r}^{-} + a_{k, r}^{-} \leq \bar{a_{r}} + \bar{a_{r}^{'}}$ , leading to Equation (34). Equation (35) stems from Equations (33) and (34). Further, applying Equation (14) and Lemma 7, we deduce $| b_{j}^{-} | = | b_{j}^{-} - b_{1}^{-} | \leq \bar{b}$ , resulting in Equation (36). Finally, Equation (37) stems from Equation (36), Lemma 6, and k such that $Int (D_{j}^{+} \cap D_{k}^{-}) \neq Ø$ . $□$

Note that the bounding values from Equations (30)–(37) are only valid when considering the solution space of well-behaved CPWL solutions. This implies that using these bounding values may eliminate some CPWL solutions that are not well-behaved from the feasible region, although not necessarily all of them. In addition, Equations (32)–(37) apply owing to Equations (14) and (15). This implies that using these values may eliminate some DC representations of the CPWL solutions but not the CPWL solutions themselves.

Proposition 6.

The time complexity for calculating the bounds and big-M parameters is $O (2^{d + 1} N^{d + 2} d)$ .

The proof of Proposition 6 is based upon the number of elements in $A_{ε}^{*} (S)$ and the time complexity of inverting a matrix in $M_{d + 1} (R)$ . A more detailed proof is provided in Section EC.4 of the electronic companion.

Table 1 gives a summary of the tightening strategies, together with their impact on the feasible region and the preprocessing involved.

Table 1. Summary of Tightening Strategies

Table 1. Summary of Tightening Strategies

Eq.	Description	# of new constraints or variable bounds^a	# of new variables	Impact on search region	Preprocessing	Time complexity
(14)	Fix one affine piece	0 ^a $d + 1$	0	Eliminate $d + 1$ dimensions	None
(15)	Sort the affine pieces	$P^{+} + P^{-} - 2$	0	Reduce the feasible region by a factor of $P^{+}! P^{-}!$	None
(16)^b	Impose $d + 1$ points per affine piece of $f^{c}$	$P^{+} + P^{-}$	0	Eliminate some non-well-behaved CPWL solutions	None
(17)–(23)^b	Impose $d + 1$ points per affine piece of f	$P^{+} P^{-} (4 N + 1)$ ^a $P^{+} P^{-} (N + 1)$	$P^{+} P^{-} (N + 1)$	Eliminate all non-well-behaved CPWL solutions	None
(24)	Tighten the big-M parameters	0	0	Eliminate some non-well-behaved CPWL solutions	Compute all $g \in A_{ε}^{*} (S)$ , evaluate g in N points, compute extrema for each point	$O (2^{d + 1} N^{d + 2} d)$
(29)–(37)	Bound the variables	0 ^a $3 N +$ $(P^{+} + P^{-}) (d + 1)$	0	Bound the search space in all variable dimensions	For all $g \in A_{ε}^{*} (S)$ , evaluate the extrema of the coefficients of g	$O (2^{d + 1} N^{d + 1} d)$

^aIn the number of additional constraints, indicates the number of additional variable bounds.

^bThe two sets of equations are alternative to one another, the first set being dominated by the second set but involving less constraints and variables than the second set.

Remark 6.

Note that applying all equations presented in Section 5 is a valid tightening of the MILP problem to the feasible region of the well-behaved CPWL solutions. As a result, applying any subset of equations from Section 5 is also a valid tightening. For example, although the variable bounds from Section 5.6 derive from fixing $f_{1}^{-} = 0$ and sorting the affine pieces of $f^{+}$ and $f^{-}$ , applying the variable bounds without sorting or fixing any affine piece is still a valid tightening of the MILP problem. As a matter of fact, any combination of the tightening strategies described here is a valid tightening. This fact is leveraged in Section 6 to identify effective combinations of tightening strategies.

6. Computational Experiments

This section illustrates the computational performances of the tightening strategies. In Section 6.1, the data sets used to assess the efficiency of the tightening strategies are described. In Section 6.2, different combinations of the tightening strategies are evaluated to identify the ones that have the most impact on the solution time. In addition, the computational performance of the tightening procedure is assessed.

The algorithms are implemented in Python (3.12.7), and the MILP problems are solved with the Gurobi solver (12.0.0). The code used for the computational experiments can be found at Ploussard (2025). Models are run on an Intel 2.3-GHz machine with 16 cores and 32 GB of RAM. For all MILP problems, the relative optimality gap (MIPGap) is set to $10^{- 6}$ . In addition, the primal feasibility tolerance (FeasibilityTol) and integer feasibility tolerance (IntFeasTol) are both set to their minimum value of $10^{- 9}$ to mitigate numerical errors.

The objective function used for all MILP problems is the maximum error—that is, Equation (13).

6.1. Description of the Data Sets

To assess the efficiency of the proposed method, six data sets are used. The data sets are summarized in Table 2. Four of the data sets are derived from mathematical functions, and two data sets are based on real-world data. Data sets based on mathematical functions are generated by randomly selecting a certain number of $x$ points in a given domain of $R^{d}$ . The first four data sets from Table 2 are two-dimensional and are illustrated by the dots in Figure 4. The last two data sets are three-dimensional (Figure 5). The third data set represents the water-to-power conversion factor (z) at the Crystal hydropower plant based on the forebay elevation ( $x_{1}$ ) and the average daily water release ( $x_{2}$ ) from the years 2014–2024 (Bureau of Reclamation 2024). The fourth data set represents the discharge temperature of a gas compressor (z) based on the volumetric flow rate ( $x_{1}$ ) and rotation speed ( $x_{2}$ ) (Marfatia and Li 2022). Data sets were rescaled in all dimensions to prevent numerical issues during the MILP solving process.

Table 2. Data Sets Used for Numerical Experiments

Table 2. Data Sets Used for Numerical Experiments

Data set	Number of data points	Dimension d
$z = x_{2} \cdot sin (x_{1})$ , $(x_{1}, x_{2}) \in [0, π] \times [0, 1]$	121	2
$z = x_{1}^{2} - x_{2}^{2}$ , $(x_{1}, x_{2}) \in {[- 1, 1]}^{2}$	64	2
Crystal power plant, Bureau of Reclamation (2024)	116	2
Gas compressor, Marfatia and Li (2022)	102	2
$z = x_{1}^{2} + x_{2}^{2} + x_{3}^{2}$ , $(x_{1}, x_{2}, x_{3}) \in {[0, 1]}^{3}$	64	3
$z = x_{1} \cdot x_{2} \cdot x_{3}$ , $(x_{1}, x_{2}, x_{3}) \in {[0, 1]}^{3}$	64	3

Figure 4. (Color online) Optimal CPWL Fitting for the Two-Dimensional Data Sets
*Note.* Each polygonal face represents an affine piece of the CPWL function f.

Figure 5. (Color online) Exploded View of the Affine Domains for the Three-Dimensional Data Sets
*Note.* Each polyhedron represents the domain of an affine piece of f.

6.2. Evaluating Different Combinations of Tightening Strategies

There are several ways to apply the tightening strategies to the MILP problem:

The first affine piece of $f^{-}$ may or may not be set to zero (Equation (14)).
The affine pieces of $f^{+}$ and $f^{-}$ may or may not be sorted (Equation (15)).
We can impose $d + 1$ points either for each affine piece of $f^{+}$ and $f^{-}$ (Equation (16)), for each affine piece of f (Equations (17)–(21)), or for no affine piece.
For the big-M parameters, we can either use an indicator constraint (Equation (11)), a reasonably large (“default”) big-M value, or the tight big-M values (Equation (24)).
The variables may or may not be bounded (Equations (29)–(37)).

In total, there are up to 72 combinations of tightening strategies. Because of this large number, only 11 combinations were selected. The selected combinations, referred to as “C1” to “C11,” are described in Table 3. The “default” big-M value is set equal to the rounded-up maximum value of $M_{i}^{c}$ —for example, 700 if the maximum value is 632.8. For each data set, the 11 combinations of tightening strategies are applied to the MILP problem before solving it, and the solution time is measured for each combination. The same number $P^{+}$ and $P^{-}$ of candidate affine pieces and the same upper bound $ε$ for the maximum error are used across all combinations. The time limit to solve the MILP problem is set to 7,200 seconds. Numerical results are summarized in Table 4.

Table 3. Combinations of Tightening Strategies Assessed

Table 3. Combinations of Tightening Strategies Assessed

Combination	Fix one affine piece	Sort the affine pieces	Impose $d + 1$ points per affine piece	Big-M parameters	Bounded variables
C1				Indicator
C2				Default
C3				Tight
C4	✓			Tight
C5	✓	✓		Tight
C6	✓	✓		Tight	✓
C7	✓	✓	of $f^{+}$ and $f^{-}$	Tight	✓
C8	✓	✓	of f	Tight	✓
C9			of $f^{+}$ and $f^{-}$	Tight	✓
C10			of f	Tight	✓
C11	✓		of $f^{+}$ and $f^{-}$	Tight	✓

Note. A blank cell indicates that the tightening strategy does not apply to the combination.

Table 4. Summary of Computation Times for Each Data Set and Combination

Table 4. Summary of Computation Times for Each Data Set and Combination

Data set	$P^{+}, P^{-}$	$ε$	Computation time (s)
Data set	$P^{+}, P^{-}$	$ε$	Preprocessing	C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11
$z = x_{2} \cdot sin (x_{1})$	2, 6	0.2	9	7,200^a	7,200^a	7,200^a	7,200^a	7,200^a	7,200^a	7,200^a	7,200^a	1,877	7,200^a	2,554
$z = x_{1}^{2} - x_{2}^{2}$	3, 3	0.1	1	7,200^a	576.7	112.8	116.9	507.8	382.2	406.4	2,752	98.3	516.6	200.2
Crystal power plant	1, 5	0.2	8	197.6	8.7	9.9	8.6	72.0	42.0	32.3	31.5	11.4	9.7	20.8
Gas compressor	2, 4	0.2	5	7,200^a	7,200^a	1,584	2,090	847.4	1,356	944.9	7,200^a	457.2	3,720	303.2
$z = x_{1}^{2} + x_{2}^{2} + x_{3}^{2}$	8, 1	0.1	37	7,200^a	3,402	1,441	111.0	7,200^a	7,200^a	7,200^a	7,200^a	531.1	478.3	159.5
$z = x_{1} \cdot x_{2} \cdot x_{3}$	2, 4	0.1	38	7,200^a	2,740	460.3	558.3	2,496	1,968	974.2	7,200^a	698.1	4,390	438.1

Notes. Summary of computation times for each data set, including the computation time of the preprocessing step and the MILP solving time of each combination of tightening strategies. Bold numbers indicate the two shortest solving times for each data set across all combinations.

^aIndicates that the time limit was reached before meeting the target optimality gap.

For each data set, the optimal objective value of the MILP problem is identical across all combinations. However, solution times noticeably vary across combinations. Results show that there is a clear benefit of using tight big-M parameter values (C3) as opposed to using a default big-M parameter value (C2) or an indicator constraint (C1). Fixing the first affine piece of $f^{-}$ (C4) without imposing $d + 1$ points per affine piece does not seem to have a significant additional benefit, except for the case “ $z = x_{1}^{2} + x_{2}^{2} + x_{3}^{2}$ .” Although it reduces the feasible region, sorting the affine pieces of $f^{-}$ and $f^{+}$ has a negative impact on the solution time (C5–C8). This might be due to the fact that these additional constraints increase the solution time of the relaxed LP problem in each node of the MILP search tree without improving the tree search itself—that is, without constraining the binary variables. Because of its negative impact, the strategy of sorting the affine pieces is discarded from the rest of the combinations. Imposing $d + 1$ points per affine piece of $f^{+}$ and $f^{-}$ and bounding the variables (C9 and C11) has a clear positive impact on the solution time. This positive effect can be explained by the reduction in the feasible region of the binary variables $δ_{i, j}^{c}$ using few additional constraints ( $P^{+} + P^{-}$ ). Although this feasible region is further reduced when imposing $d + 1$ points per affine piece of f (C10), this constraint has a negative impact on the solution time. This might be due to the larger number of additional variables and constraints required to model the affine pieces of f. The only difference between combinations C9 and C11 is whether the first affine piece of $f^{-}$ is being fixed. Numerical results show that adding this strategy may either improve or worsen the solution time, depending on the data set—that is, there is not a clear systematic benefit in applying this strategy. When comparing the best solution time of C9 and C11 to the solution time of C2, the reduction factor in solving time ranges from 4 to 23, except for the Crystal power plant case. For the worst solution time, the reduction factor ranges from 3 to 16. Overall, the following combination consistently performs among the best ones: imposing $d + 1$ points for each affine piece of $f^{+}$ and $f^{-}$ (Equation (16)), tightening the big-M parameters (Equation (24)), fixing the first affine piece of $f^{-}$ (Equation (14)), and bounding all variables (Equations (29)–(37)).

The computation time of the preprocessing step, used to calculate the big-M parameters and variable bounds, is also included in Table 4. The computation time of this step is negligible for two-dimensional data sets. However, the computation time becomes significant for three-dimensional data sets. This is due to the fact that the time complexity of the preprocessing steps increases exponentially with the dimension of the data set (Table 1). Yet, the combined execution time of preprocessing and solving the tightened MILP problem is still shorter than the time required to solve the nontightened MILP problem, demonstrating the efficiency of the tightening approach in reducing overall computational effort.

7. Conclusions

In this paper, we formalize the concept of well-behaved CPWL interpolations, a class of CPWL interpolations in which each affine piece interpolates a number of points greater than the dimension of the interpolated data set. Next, we demonstrate that any CPWL interpolation has a well-behaved version. Then, we introduce several tightening strategies aimed at improving the solution time of the MILP formulation of the CPWL-fitting problem for data sets in general dimensions. Some of the tightening strategies leverage the fact that any CPWL interpolation has a well-behaved version. In addition, we analyze each tightening strategy in terms of additional constraints, additional variables, impact on the feasible region, and time complexity of the preprocessing step. Then, we identify the combinations of tightening strategies that have the most impact on the solution time of the MILP problem. Experimental results show that the tightening procedure significantly reduces the solution time of the MILP problem in most cases. However, the solution time reduction factor can vary significantly, depending on the size and dimension of the data set, with values ranging from 3 to 23 across five of the six case studies. In addition, the computation time of the preprocessing step rapidly increases with the dimension of the data set, making the tightening procedure computationally challenging for high-dimensional data sets ( $d \geq 4$ ). This issue can be partially addressed using parallelization, as the preprocessing step is inherently suited for parallel processing.

The theoretical work presented here applies to data points in general position (no subset of $d + 1$ points is contained within a single hyperplane). There is a need to address the special case where the points are not in general position—for example, if the points are located on a lattice. Research work should also focus on improving the time complexity of the preprocessing algorithms for high dimensions. In addition, future research should aim to conduct a comprehensive comparison between the MILP and NN approaches with respect to computation time, number of affine pieces, and approximation error.

Acknowledgments

This work was authored for the Department of Energy (DOE) Office of Energy Efficiency and Renewable Energy by Argonne National Laboratory, operated by UChicago Argonne LLC under contract number DE-AC02-06CH11357. This study was supported by the HydroWIRES Initiative of DOE’s Water Power Technologies Office.

References

Bärmann A, Burlacu R, Hager L, Kutzer K (2023) An approximation algorithm for optimal piecewise linear interpolations of bounded variable products. J. Optim. Theory Appl. 199(2):569–599.Google Scholar
Bertsimas D, Tsitsiklis J (1997) Introduction to Linear Optimization (Athena Scientific, Nashua, NH).Google Scholar
Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Google Scholar
Bureau of Reclamation (2024) HDB data service. Accessed April 3, 2025, https://www.usbr.gov/lc/region/g4000/riverops/_HdbWebQuery.html.Google Scholar
Chen KL, Garudadri H, Rao BD (2023) Improved bounds on neural complexity for representing piecewise linear functions. Preprint, submitted January 16, https://arxiv.org/abs/2210.07236.Google Scholar
D’Ambrosio C, Lodi A, Martello S (2010) Piecewise linear approximation of functions of two variables in MILP models. Oper. Res. Let. 38(1):39–46.Google Scholar
Davis PJ (1975) Interpolation and Approximation (Courier Corporation, North Chelmsford, MA).Google Scholar
DeVore R, Hanin B, Petrova G (2021) Neural network approximation. Acta Numer. 30:327–444.Google Scholar
Duguet A, Ngueveu SU (2022) Piecewise linearization of bivariate nonlinear functions: Minimizing the number of pieces under a bounded approximation error. Ljubić I, Barahona F, Dey SS, Mahjoub AR, eds. Combinatorial Optimization (Springer International Publishing, Cham, Switzerland), 117–129.Google Scholar
Edelsbrunner H, O’Rourke J, Seidel R (1986) Constructing arrangements of lines and hyperplanes with applications. SIAM J. Comput. 15(2):341–363.Google Scholar
Frenzen CL, Sasao T, Butler JT (2010) On the number of segments needed in a piecewise linear approximation. J. Comput. Appl. Math. 234(2):437–446.Google Scholar
Geißler B, Martin A, Morsi A, Schewe L (2012) Using piecewise linear functions for solving MINLPs. Lee J, Leyffer S, eds. Mixed Integer Nonlinear Programming (Springer, New York), 287–314.Google Scholar
Gurobi Optimization LLC (2024a) Gurobi Optimizer reference manual. Accessed April 3, 2025, https://www.gurobi.com.Google Scholar
Gurobi Optimization LLC (2024b) Model.addGenConstrIndicator(). Accessed April 3, 2025, https://www.gurobi.com/documentation/current/refman/py_model_agc_indicator.html.Google Scholar
Huang C (2020) ReLU networks are universal approximators via piecewise linear or constant functions. Neural Comput. 32(11):2249–2278.Google Scholar
Hughes RB, Anderson MR (1996) Simplexity of the cube. Discrete Math. 158(1):99–150.Google Scholar
Kazda K, Li X (2021) Nonconvex multivariate piecewise-linear fitting using the difference-of-convex representation. Comput. Chemical Engrg. 150:107310.Google Scholar
Kazda K, Li X (2024) A linear programming approach to difference-of-convex piecewise linear approximation. Eur. J. Oper. Res. 312(2):493–511.Google Scholar
Kong L, Maravelias CT (2020) On the derivation of continuous piecewise linear approximating functions. INFORMS J. Comput. 32(3):531–546.Link, Google Scholar
Kripfganz A, Schulze R (1987) Piecewise affine functions as a difference of two convex functions. Optimization 18(1):23–29.Google Scholar
Marfatia Z, Li X (2022) Data-driven natural gas compressor models for gas transport network optimization. Digital Chem. Engrg. 3:100030.Google Scholar
Misener R, Floudas CA (2010) Piecewise-linear approximations of multidimensional functions. J. Optim. Theory Appl. 145(1):120–147.Google Scholar
Ploussard Q (2024) Piecewise linear approximation with minimum number of linear segments and minimum error: A fast approach to tighten and warm start the hierarchical mixed integer formulation. Eur. J. Oper. Res. 315(1):50–62.Google Scholar
Ploussard Q (2025) Continuous piecewise linear fitting: Code repository. Accessed April 3, 2025, https://github.com/quentinplsrd/cpwl-nd-optimization.Google Scholar
Rebennack S, Kallrath J (2015) Continuous piecewise linear delta-approximations for bivariate and multivariate functions. J. Optim. Theory Appl. 167(1):102–117.Google Scholar
Rebennack S, Krasko V (2020) Piecewise linear function fitting via mixed-integer linear programming. INFORMS J. Comput. 32(2):507–530.Link, Google Scholar
Toriello A, Vielma JP (2012) Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1):86–95.Google Scholar
Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.Google Scholar
Vielma JP, Ahmed S, Nemhauser G (2010) Mixed-integer models for nonseparable piecewise-linear optimization: Unifying framework and extensions. Oper. Res. 58(2):303–315.Link, Google Scholar
Warwicker JA, Rebennack S (2022) A comparison of two mixed-integer linear programs for piecewise linear function fitting. INFORMS J. Comput. 34(2):1042–1047.Link, Google Scholar
Warwicker JA, Rebennack S (2023) Generating optimal robust continuous piecewise linear regression with outliers through combinatorial Benders decomposition. IISE Trans. 55(8):755–767.Google Scholar
Warwicker JA, Rebennack S (2025) Efficient continuous piecewise linear regression for linearising univariate non-linear functions. IISE Trans. 57(3):231–245.Google Scholar

cover image INFORMS Journal on Optimization

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:April 03, 2025
Accepted:December 05, 2025
Published Online:December 29, 2025

Cite as

Quentin Ploussard, Xiang Li, Matija Pavičević (2026) Tightening the Difference-of-Convex Formulation for the Piecewise Linear Approximation in General Dimensions. INFORMS Journal on Optimization 0(0).

https://doi.org/10.1287/ijoo.2025.0074

Keywords

Acknowledgments

PDF download

Available Issues

Available Issues

Available Issues

Tightening the Difference-of-Convex Formulation for the Piecewise Linear Approximation in General Dimensions

Abstract

1. Introduction

2. Background

3. MILP Formulation

4. Well-Behaved CPWL Interpolations

5. Tightening the MILP Formulation

5.1. Fixing One Affine Piece

5.2. Sorting the Affine Pieces

5.3. Imposing $d + 1$ Points per Affine Domain of $f^{c}$

5.4. Imposing $d + 1$ Points per Affine Domain of f

5.5. Tightening the Big-M Parameters

5.6. Bounding the Variables

6. Computational Experiments

6.1. Description of the Data Sets

6.2. Evaluating Different Combinations of Tightening Strategies

7. Conclusions

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News

Available Issues

Available Issues

Tightening the Difference-of-Convex Formulation for the Piecewise Linear Approximation in General Dimensions

Abstract

1. Introduction

2. Background

3. MILP Formulation

4. Well-Behaved CPWL Interpolations

5. Tightening the MILP Formulation

5.1. Fixing One Affine Piece

5.2. Sorting the Affine Pieces

5.3. Imposing d+1 Points per Affine Domain of fc

5.4. Imposing d+1 Points per Affine Domain of f

5.5. Tightening the Big-M Parameters

5.6. Bounding the Variables

6. Computational Experiments

6.1. Description of the Data Sets

6.2. Evaluating Different Combinations of Tightening Strategies

7. Conclusions

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

5.3. Imposing $d + 1$ Points per Affine Domain of $f^{c}$

5.4. Imposing $d + 1$ Points per Affine Domain of f