Improving Mathematical Exposition of an Industrial-Scale Linear Program
Abstract
Industrial-scale models require considerable setup time; hence, once built, they are used in myriad ways to consider closely related cases. In practice, the code for these models frequently evolves without appropriate notational choices, largely as a result of the lengthy development time of, and the number of individuals contributing to, their formulation. This leads to inefficiencies and obfuscates model structures that might be leveraged to expedite solutions. In this paper, we advocate for an emerging literature on model formulation “best practices” and present the reformulation of a widely used industrial-scale linear program. The efficient mathematical expression of this linear program, used to plan capacity expansion in the energy sector, allows for greater transparency of model structures and enhanced ability to identify computational performance improvements, as well as a lucid interpretation of its solutions. This type of formulation is employed in several mathematical programming courses at our university as an example of the advantages of best practices; the model more broadly is used widely to inform policy in the U.S. energy sector.
Funding: This work was supported by the National Renewable Energy Laboratory (Laboratory Directed Research and Development program).
1. Introduction and Literature Review
Students in mathematical programming courses are generally introduced to formulation and solution techniques in the context of relatively small models with only a few variables and constraints. Examples of this nature can be found throughout commonly used texts, such as Dantzig (1991), Bazaraa et al. (2006), and Rardin (2017). Whereas there are valid pedagogical reasons for presenting relatively trivial models in the classroom to demonstrate structures and solution techniques, the typical textbook example fails to prepare the student for the level of complexity seen when formulating and solving a realistic mathematical programming problem.
In addition to being conveniently small, the examples observed in textbooks are generally presented with a well-posed structure and concise notation, consistent with the exposition of the material immediately preceding it. In industrial-scale applications, problems are often presented without any deference to model structure and convenient notation. In the best case, the model at hand is developed via iterative processes through which decision variables and constraints are identified, structures evolve, and modifications are made in order to exploit these structures, as appropriate. However, at this scale, models are often developed by teams of individuals, not all of whom have specific expertise in operations research. The resulting mathematical programs may evolve over a period of years, and distinct models with similar features or objectives may be combined. Additionally, descriptive words naming variables and parameters, conventional in computer science, may be used to help with documentation, creating additional inefficiencies.
In this paper, we explore the importance of presenting the mathematics underlying industrial-scale optimization models and their associated complexity to emphasize the importance of adopting sound notational and organizational conventions for the expression of a model. The National Renewable Energy Laboratory (NREL) Regional Energy Deployment System (ReEDS) model (Eurek et al. 2016, Cole et al. 2017) is an award-winning software tool recognized by R&D100 in the software category (National Renewable Energy Laboratory 2019). The tool has undergone a reformulation to make it open to, and interpretable by, the public, as opposed to solely by the in-house engineers and scientists who have used it for the past two decades. The reformulation process provides an opportunity to present the value of efficiently expressing large optimization models as mathematics and allowing for greater transparency of model structures, as well as a more lucid interpretation of its solutions. We propose the use of consistent notation in complex formulations, contrasting the elements of the existing model with equivalent reformulations emphasizing best practices related to (i) identifying efficient sets and indices for a concise expression; (ii) naming indices, sets, parameters, and variables; and (iii) expressing the structure through the organization of the objective and constraints.
In recent years, there has been a growing literature advocating clear and consistent notation for the presentation of complex mathematical programming models (best practices). An early example of the advocacy of clear writing in mathematical exposition in general can be found in Halmos (1970), in which the author emphasizes organization and careful thought in selection of notation. This level of organization is clearly present in the examples found in common textbooks. To complement this, Powell (2001) provides a high-level discourse on the relevant constituents of and considerations that must be accounted for in mathematical modeling.
We have presented this reformulation in two different courses at the Colorado School of Mines. The first is a freshman-level Honors Calculus III course in the context of a group term project consisting of (i) exploring the relationships between single- and multiple-variable optimization techniques introduced in Calculus I and III, respectively; (ii) discovering standard numerical approaches to solving nonlinear, unconstrained optimization problems; and (iii) introducing extensions of these problems to higher dimensional spaces and pointing to real-world examples. For most of these students, this is their first exposure to a model (optimization or otherwise) that requires more than two or three variables for its expression. Whereas the students are not asked to formulate or solve such models at this point, this case does introduce higher dimensional complexity as well as the need to be thoughtful in the formulation of such “large problems.” Within the context of this term project, students are asked to speculate how they may use optimization models and how these models might be larger than those with which they are familiar in the calculus context. Correspondingly, student groups are more readily able to imagine examples of models related to their eventual field of study and how they might be thoughtfully formulated.
The second course in which the reformulation is used is Advanced Linear Programming, in which we introduce the model via a presentation to the students within the lecture. These students typically have had a masters-level class in the subject, in addition to other, related math programming and computer science courses. The intent of the material is to show them that (i) large-scale linear programs exist and are used to make significant policy decisions and (ii) it is extremely important to carefully document such models using precise notation. Students typically do not pay attention to notation, and their use of sets and indexed sets is underdeveloped in their introductory mathematical programming courses. This is to be expected as the models they formulate are fairly simple to state; it is, therefore, possible to code the corresponding mathematical programs inefficiently and obtain a solution in a reasonable amount of run time (usually seconds, at most). By describing how ReEDS must be formulated to both understand and debug it, and to obtain solutions more quickly after implementing efficiency-enhancing measures, students gain an appreciation not only for the impact that well-formulated models can have, but also for the discipline necessary to implement large-scale models in an industrial setting. They can then apply these principles to their own research along with other methodological and performance-enhancing tools taught in the class.
Specific to the expression of larger mathematical programs, Brown (2004), Brown and Dell (2007), and Brown and Rosenthal (2008) provide examples in which the authors promote thoughtful organization and expression of the mathematical program, as well as caveats regarding limitations of communicating models via spreadsheets or even modeling languages. Based on practices suggested in these earlier works, Teter et al. (2016) generate a comprehensive set of guidelines for the articulation of large and complex models, and we adopt the recommendations of that paper.
Unlike in Newman and Weiss (2013), which highlights tutorials on optimization-related paradigms and solution techniques, we focus our efforts on the presentation of large and complex optimization models, using a specific application in the energy sector. In Sections 2.1 through 2.5, we present a high-level summary of best practices for the abstract expression of complex models, including brief examples. In Section 3, the ReEDS model and, in particular, its place as an analytical tool in the energy policy literature are described. Having provided a summary of best practices and context for our model, we focus on specific examples of improved clarity in the expression of the model and its various components in Sections 4.1 through 4.5. Section 5 gives a concise summary of the improvements in model expression and proposes work toward further enhancing the model.
The contribution of this paper is its presentation of a systematic reformulation of the previous version of ReEDS, utilizing the best practices summarized in Teter et al. (2016) and providing an example for the thoughtful formulation of industry-scale mathematical programs, agnostic to modeling environment and suitable for presentation to students. Whereas other instances exist, this paper furnishes an example that might be emphasized to students in the context of model formulations beyond the standard textbook-scale examples. This work is a reflection of collaborative efforts between researchers to translate the model, expressed as code in a specific modeling environment, into mathematics, which may be readily translated into any modeling environment once the model is made public. In addition to a discussion of formulation best practices, this example also provides an opportunity to highlight the potential for improvements in model tractability, solution interpretation, and model adaptability that result from enhanced formulations, allowing the user to identify model structures through consistent mathematical expression.
2. Brief Summary of Best Practices for Mathematical Formulations
This section provides a brief summary of the best practices, advocated by Teter et al. (2016), for the expression of large and complex optimization models. An emphasis is placed on defining a clear standard for notation that employs descriptive-letter rather than descriptive-word naming conventions for indices, sets, parameters, and variables. Detailed in Section 4, we present a reformulated version of the ReEDS model in a format that can easily be translated into code in any optimization modeling environment.
We also emphasize a standard structure for presenting the optimization models in which we first define the indices, sets, parameters, and variables, followed by the statement of the objective and the constraints. To the fullest extent possible, parameters, variables, and constraints should be organized in a manner that enhances model clarity. Presented this way, optimization models are stated consistently and concisely and reveal the mathematical structure. ReEDS, described in detail in Section 3, is NREL’s flagship tool for examining the long-term implications of technological change, market evolution, and energy and environmental policy decisions on the electricity sector of the U.S. economy. It is a large-scale, detailed optimization (linear programming) model that has evolved over a long period of time, employing many best practices for coding within a specific modeling environment but without corresponding attention to the best practices for mathematical formulations advocated for in the operations research literature.
We present model components from the reformulation of a legacy version of ReEDS as examples of formulation best practices in the remainder of the current section. In Section 4, we present detailed comparisons between elements of the legacy version of the ReEDS model and their equivalent expressions once reformulated as it exists at the time of this writing.
2.1. Definition of the Indices and Sets for the Model
Adhering to best practices, sets and their associated indices are denoted by uppercase calligraphic letters and the corresponding lowercase letter, respectively. For example,
: Set of all generating technologies in the model.
: Set of years included in the model.
As necessary, we include a prime on an index to differentiate between two indices related to a single set. For example,
r, : Set of regions between which interactions might occur in the model.
Subsets are denoted with a superscript, hat, and/or index as appropriate. For example,
: Set of “future” years in the model, that is, 2024, …, 2050.
For a set, itself differentiated by indices, that is, an indexed set, appropriate subscripts are employed. For example,
: Set of years during which a plant of technology i built in year t is operating.
2.2. Definition of the Model Parameters
To the extent possible, parameters are denoted by lowercase Roman or Greek letters phonetically associated with the given parameter with indices in the subscripts, paying attention to physical units. For example,
civrht: Variable cost for technology i of vintage class v in region r during time slice h of year t [$/MWh].
: Minimum loading requirement for technology i in region r during time slice h [%].
2.3. Definition of the Model Variables
In a similar manner, variables are denoted by uppercase Roman letters phonetically associated with the given quantity, again paying attention to units. For example,
Givrht: Average generation by technology i of vintage class v in region r in time slice h in year t [MW].
Whereas lowercase characters are used for parameters and uppercase for variables, this convention can be reversed (or modified) as long as variables can easily be differentiated from parameters; this differentiation is particularly important in nonlinear models to elucidate the number and types of nonlinearities, which significantly influence the ease with which instances can be solved. For example, convex models (i.e., the minimization of a convex objective function over a convex set of constraints) can be solved to global optimality even with a local solver, whereas models that contain nonlinear expressions with domain violations might be intractable. And, generally speaking, the more nonlinear expressions a model contains, the more difficult it is to solve.
2.4. Definition of the Objective Function
The statement of the objective function includes the sense of the problem (i.e., minimization or maximization), involves parameters and decision variables, and is presented using summation notation over all appropriate sets and their associated indices, preferably in the order in which they appear in the notation. In addition, the product of a variable and a parameter is expressed with the variable left-multiplied by the parameter to further distinguish between these two types of quantities. For example,
For consistency and clarity, the left-multiplication of variables by parameters is the convention utilized in the constraints as well.
2.5. Definition of the Model Constraints
Regarding the organization of the constraints, we impose a structural hierarchy. Specifically, we first group constraints by the function they serve in the model (e.g., physical laws that must be obeyed by the solution; capacity limits; minimum generation requirements for operation; and constraints that link generation, storage, and transmission). Then, to the extent possible, we organize individual constraints by variable types within these groupings (e.g., limits on generation, storage, and transmission). This level of organization better enables the identification of structures in the constraint matrix that may be exploited.
Constraints are numbered consecutively by group with variable domains (nonnegative, binary, or integer) provided at the end. The units on the left- and right-hand sides of every constraint match. Furthermore, all indices used in each constraint are accounted for either via summation or through a constraint qualifier symbol to indicate that the constraint holds over all instances of an index not accounted for in the summations specified in the constraint. For example,
3. Example: The NREL ReEDS Model
ReEDS is NREL’s flagship capacity expansion model for examining the long-term implications of technological change, market evolution, and energy and environmental policy decisions on the electricity sector of the U.S. economy. Originally developed by NREL as the Wind Deployment System (WinDS) model in 2001, it examined long-term market penetration of wind in the electric power sector. From 2003 to 2008, WinDS was used in analyses including state-level policies on wind deployment and U.S. wind manufacturing and wind power storage. In 2009, it was recast as ReEDS for the analysis of long-term deployment interactions across multiple technologies in the electric power sector. In both its previous and current forms, ReEDS is a linear program with nonnegative continuous variables representing
Energy generation, storage, and transmission and the associated changes therein, each of which must be accounted for by technology.
Renewable energy credits for state-to-state energy trading to comply with regulations.
Penalties for elasticized capacity constraints.
ReEDS minimizes the capital and operational cost related to energy generation, storage, and transmission and associated changes therein across the United States subject to constraints related to the physics of technologies represented, capacities, demands, policy regulations, and transmission interactions at regional and national boundaries. By default, the time horizon expands to 2050 with daily “time slices” to allow for seasonal and diurnal attributes within a given year. It accounts for all allowable technology combinations of generation, storage, and transmission and feasible uses for these various technology combinations as well as region-to-region and state-to-state exchanges of market products (i.e., transmissions of energy).
ReEDS is a primary analytical tool in numerous studies on emerging electricity generation and storage technologies. A survey of these studies from recent years includes work focused on the integration of the U.S. and Canadian electric sectors (Ibanez and Zinaman 2016, Beiter et al. 2017); the general impacts of renewable sources on the electric energy sector (Mai et al. 2014, Wiser et al. 2017); greenhouse gas mitigation options within the electric sector (Sullivan et al. 2014); and the role of specific technologies, such as natural gas (Cole et al. 2016), nuclear (Richards and Cole 2017), and wind (Wiser et al. 2016), used for electricity generation.
One common mistake with models that have several different developers over the course of their evolution is the use of descriptive word naming for indices, sets, parameters, and variables as a proxy for separate documentation. This convention
Unnecessarily occupies space in memory.
Requires more time to code in new modeling environments.
Renders the code more difficult to debug in new modeling environments.
Obfuscates mathematical structure in the model.
Provides an unwelcome excuse not to fully document the math because
— Coders may equate verbose code with documentation of the mathematics.
— It is unnecessarily arduous to convert descriptive word variable and parameter names in the code into actual documentation of the mathematics.
At the time of this writing, the ReEDS model has been rewritten from scratch following several of the suggestions from this work; a legacy model, however, had been comprehensively presented in GAMS code without an explicit mathematical statement of the full current version of the model (although aspects of earlier versions are documented mathematically in technical reports). Its growth had taken place under the auspices of a rotation of researchers with evolving coding conventions. The model had increased in scale over time with the introduction of new technologies and policies; unfortunately, these improvements had not been accompanied by a formal emphasis on maintaining parallel documentation of the mathematical structure underlying the code.
The authors reformulated the previous version of ReEDS in order to enhance its usability and expand its features and have since released it to the public in 2020 (https://www.nrel.gov/analysis/reeds/). The reformulation entailed the clarification of mathematical expressions and identification of formulation improvements as well as algorithmic strategies to more efficiently solve instances of the model for various analyses. The authors dissected the original GAMS code, reconstructed the mathematics therein, and recommended modifications consistent with the best practices presented earlier. NREL subsequently reformulated the model based on these recommendations and, ultimately, demonstrated a near-exact replication and, in doing so, validated the previous model. The model reformulation resulted in substantially reduced model generation and solve times; although there can be significant deviation across assumptions and complexity, the average reduction in a model run time is approximately 35% (i.e., from 3 to more than 5 hours on an original a 12-hour solve).
In Sections 4 and 5, we present specific examples of the legacy expression of the model, necessitating its reformulation as a new version, defined mathematically as () and provided in its entirety in Appendix C in order to streamline and improve its statement to achieve (i) a clearer communication of the model in a format that illustrates underlying mathematical structure and can more readily incorporate future additions and changes; (ii) greater transparency between the expression of the problem and its solution for the scientists and policymakers who use () for future research and policy considerations; and (iii) if possible, improved model solution performance to facilitate parametric analyses by all users. This reformulation affords us the opportunity to present the contrast between the previous version of ReEDS and () and advocate for the efficient and consistent presentation of a complex optimization model using the set of best practices articulated earlier.
4. Methodology and Results—Implementing Best Practices in ReEDS
In this section, we provide specific examples related to how the best practices articulated in Section 3 are implemented in reformulating the core elements of the previous version of ReEDS into the clearer representation expressed in (). From a practitioner’s perspective, this is a necessary step in order to exercise the model and meet the needs of NREL, including
Achieving faster solve times.
Making the code more transparent for the purpose of sharing the model with the broader community.
Enabling practitioners to implement ReEDS in the modeling environment of their choice.
Efficiently adding constraints to model regulatory policy proposals and new technologies.
We present elements of the legacy model. In doing so, we contrast the past practice of maintaining the mathematical expression of the model as code—utilizing standard coding practices but not explicitly documenting the mathematics independent of the code—with the best practices for defining the model mathematically. This is done to distinguish between allowing code to substitute for mathematics and the best practice of supplementing code with a clear expression of the mathematics underlying it. We describe the notational conventions adopted for naming the indices, sets, parameters, and variables with examples of improved clarity from the previous version of ReEDS to () in each instance. We then demonstrate the improvement in the statement of the objective function and, similarly, provide examples of the enhanced clarity in the definition of the constraints between the two models. Finally, we give a summary of the overall improvements in the efficiency of the statement of the mathematical model when these best practices are applied to the legacy version of ReEDS to restate it as (). The reformulated core model, in its entirety, has been made available to the public by NREL and attempts will be made to incorporate best practices for its formulation in the accompanying technical reports. The changes detailed in () lead to more transparent code and efficient implementation of additional constraints as necessary to model regulatory policy proposals and new technologies, and we anticipate faster solve times.
4.1. Indices and Sets
In the GAMS code representing the previous version of the ReEDS model, indices and sets are inconsistently and inefficiently defined. For example, the set of geographic regions over which the model is constructed is indexed differently depending on which variable it is indexing as well as which update (over the 18-year history of the model) led to its inclusion. Figure 1 provides a list of indices that represent geographic regions within the United States and Canada.

Whereas the first four are single-letter indices, the fact that they are different letters representing members of similar hierarchically structured sets can lead to confusion. The fifth and sixth indices are particularly cumbersome as they follow the descriptive word convention and lead to exceedingly verbose mathematical expressions in those portions of the GAMS code that employ them as subscripts on descriptive word parameters and variables.
Another example of inefficiently defined indices and sets is that of technologies. In the legacy version of ReEDS, parameters in the GAMS code lent themselves to two primary areas of improvement: (i) shortening the naming convention and (ii) removing arithmetic operations within the model itself. Every distinct technology category (i.e., conventional, wind, photovoltaic (PV), concentrating solar power (CSP), geothermal, biomass, hydropower, marine hydrokinetic (MHK), storage) invoked multiple sets, within each of which there are unique (and often descriptive word) indices. Table 1 presents the complexity of the ad hoc definition of sets and descriptive word indices for the collection of these technologies.
|
Table 1. Subset of All Sets and Indices for Generating Technologies in the Legacy Version of ReEDS
| Legacy ReEDS technology sets | Index (indices) |
|---|---|
| Conventional generation technologies | |
| Wind generation technologies | windtype, c |
| PV generation technologies | pvclass |
| CSP generation technologies | cCSP, nostorage, storage |
| Geothermal generation technologies | geoclass, geotype, geoplant |
| Biomass generation technologies | bioclass |
| Hydropower generation technologies | |
| MHK generation technologies | mhkwaveclass |
| Storage technologies |
All of the sets detailed in Table 1, although distinctly named, can be thought of as subsets of a more general set of allowable combinations of technologies. In (), we combine these subsets into a single set of feasible technologies, defined as
In the legacy version of ReEDS, the sets and indices are not clearly evident to the user. Rather, they must be inferred by reviewing poorly documented variable declarations that are detailed across approximately six pages of GAMS code. By contrast, for the entire statement of the comparable model (), we use the sets (and associated indices), denoted with the descriptive-letter naming convention when possible with subsets and indexed sets defined as necessary, as summarized in Table 2.
|
Table 2. () Model Sets (Included in the Reformulation)
| Set | Description |
|---|---|
| Generating technologies | |
| Aggregated sets of generating technologies | |
| Regions | |
| Years | |
| Time slices within days in modeled years | |
| Ancillary services | |
| Transmission line types | |
| Seasons within modeled years | |
| Modeled years | |
| Renewable supply curve (RSC) technologies | |
| Daytime time slices (morning, afternoon, evening) | |
| Nighttime time slices | |
| Wind generating technologies | |
| PV generating technologies | |
| Vintage classes that are valid for technology i in year t | |
| Bins that are used to define the RSC for technology i in region r |
Within the formal definition of (), detailed in Appendix C, we employ subsets and indexed sets in addition to those defined in Table 2 to efficiently express the linear program in its entirety.
4.2. Parameters
In the GAMS code for the legacy version of ReEDS, parameters are defined with two primary sources of inefficiency: (i) the use of the descriptive word naming convention and (ii) the construction of parameters as arithmetic combinations of multiple “subparameters,” which have, in turn, been denoted using the descriptive word convention. The first term of the objective function provides an example. The parameter in question is the cost coefficient on new capacity of conventional generation technologies (indexed by type “q” and cooling technology “ct”) in each region (indexed by “n”). Figure 2 presents the ReEDS GAMS code expression for this cost coefficient, which translates to

Not only does this naming convention unnecessarily complicate the exposition of the model, but it also eliminates the ability to preprocess the data by performing this arithmetic external to the GAMS code. The legacy formulation induced unnecessary computations within the solver, precluding preprocessing that would allow the modeler to error-check the data and eliminate instances in which zero coefficients are approximated by machine epsilon (i.e., or ) as a result rounding in the floating-point arithmetic when computing coefficients within the code; this might lead to potential numerical instability in the solution of the linear program (Klotz and Newman 2013).
By contrast, in the comparable model (), after preprocessing the data, the parameter in the example from the legacy version of ReEDS is approximately expressed in the reformulated model as the capital cost coefficient for investments using technology i in region r of year t, denoted by
The reduction in complexity from the previous version of ReEDS to () is even greater than the preceding example implies because there exists a distinct variable, defined over its own different, but related, sets and indices for several types of technologies. Combining these technologies into a single set of feasible technology combinations, as detailed in Section 4.1, allows us to eliminate a significant number of terms when writing out ().
Table 3 details the () parameters utilized in the reformulation, including cost coefficients and their appropriate discount factors as well as a subset of the parameters found in the constraints.
|
Table 3. () Model Parameters (Included in the Reformulation)
| Parameter | Description |
|---|---|
| Cost parameters | |
| Discount factor applied to operational costs incurred in year t [present$/future$] | |
| Discount factor applied to investment costs incurred in year t [present$/future$] | |
| Variable operation and maintenance (VOM) cost for technology i of vintage class v in region r in time-slice h of year t [$/MW] | |
| Fuel cost for technology i in region r in time slice [$/MW] | |
| Ancillary service cost for technology i providing ancillary service product a in time slice h [$/MW] | |
| Fixed operation and maintenance (FOM) cost for technology i of vintage class v in region r in year t [$/MW] | |
| Capital cost for investments in generating capacity from technology i in region r in year t (includes financing) [$/MW] | |
| Capital cost for spur-line transmission investments for generating capacity from technology i in region r in year t from RSC bin b (includes financing) [$/MW] | |
| Capital cost for bulk transmission line investments for line type l from region r to region in year t, accounting for distance between regions [$/MW] | |
| Operational requirement parameters | |
| Transmission loss rate between region , accounting for distance [%] | |
| Average load in region r in time slice h in year t [MW] | |
| Fraction of load that contributes to the requirements of ancillary service product a [MW/MW] | |
| Fraction of wind generation that contributes to the requirements of ancillary service product a [MW/MW] | |
| Fraction of photovoltaic (PV) capacity during daytime hours that contributes to the requirements of ancillary service product a [MW/MW] |
|
Table 4. () Model Variables
| Variables | Description |
|---|---|
| Givrht | Average generation by technology i of vintage class v in region r in time slice h of year t [MW] |
| Saivrht | Ancillary service type a provided by technology i of vintage class v in region r during time slice h in year t [MW] |
| Kivrt | Generating capacity for technology i of vintage class v in region r in year t [MW] |
| Nivrt | Investment in capacity for technology i of vintage class v in region r in year t [MW] |
| Investment in refurbished capacity for technology i of vintage class v in region r in year t [MW] | |
| Investment in capacity for technology i of vintage class v in region r in year t from RSC bin b [MW] | |
| Investment in transmission capacity of line type l between regions in year t [MW] | |
| Investments in technology aggregation category j in region r in year t that exceeds the prescribed build requirement [MW] | |
| Transmission capacity online type l between regions in year t [MW] | |
| Flow of firm capacity from region r to region in season s of year t [MW] | |
| Average flow of energy along transmission line type l from region r to region in time slice h in year t [MW] | |
| Average charging of storage technology i of vintage class v in region r in time slice h in year t [MW] | |
| Average discharging of storage technology i of vintage class v in region r in time slice h in year t [MW] | |
| Average flow of ancillary service product a from region r to region in time slice h in year t [MW] |
We also define parameters to characterize quantities such as resource utilization, capacities, and minimum operating requirements in the constraint set, as detailed in Appendix C. Best practices are similarly applied to additional parameters in ().
4.3. Variables
There is a significant simplification in the expression of the model variables between the legacy version of ReEDS and (); the former uses the descriptive word naming convention paired with the inefficiencies in the definitions of indices and sets described in Section 4.1. As an example, consider the declaration of the variable representing the amount of energy generated via geothermal technologies. Figure 3 presents the GAMS code expression for this variable in the legacy version of ReEDS, which translates initially to a variable expressed as

By contrast, in the equivalent model (), this variable, as well as several other distinctly declared ReEDS variables, are represented as a single variable for energy generated using technology i of vintage class v in region r during time slice h of year t denoted by
As with the model parameters, the reduction in complexity from the legacy version of ReEDS to () is even greater than this example implies because there exists a distinct variable, defined over its own different but related sets and indices for several types of technologies. Aggregating these technologies into a single set of feasible technology combinations as described in Section 4.1 eliminates the declaration of several of the distinctly defined variables. The legacy version of the ReEDS model contains 121 variables, each with the cumbersome descriptive word notation conventions; in (), only the 14 variables detailed in Table 4 with concise single letter notation are necessary to express the model.
Moreover, this reduction in variables also allows for the elimination of several terms when writing out both the objective and constraint sets in () as Sections 4.4 and 4.5 demonstrate.
4.4. Objective Function
There is a significant simplification in the expression of the objective function between the legacy version of ReEDS and (). Before implementing the improvements detailed in Sections 4.1–4.3, the GAMS code for the objective spanned more than 12 pages and included 139 distinctly expressed terms with multiple in-line comments to aid in interpretation. The GAMS code expression for the first four (out of 139) terms of the objective is presented in Figure 4, the exact mathematical translation of which, prior to the introduction of the clear notation introduced in (), can be found in Appendix A.

By contrast, using the indices, sets, parameters, and variables defined for () in Sections 4.1–4.3, we are able to express the entire objective with far fewer terms and in a fraction of the space as
4.5. Constraints
Just as with the objective function, the adoption of best practices for expressing the mathematical representation of optimization models leads to a significant simplification in the expression of the constraint sets between the legacy version of ReEDS and (). Researchers at NREL often added distinct constraints when either technologies are introduced or regulations or policy constraints emerge. A careful review of the GAMS code reveals inconsistency in the process for adding restrictions to the model with entirely new blocks of constraints being written for new technologies in some instances and the insertion of new technologies within previously existing blocks of constraints in others. Similar inconsistencies exist for new regulatory or policy restrictions. As with the objective function, there are often in-line comments to “document” the addition of technologies and regulations or policies. The more efficient notation used for indices, sets, parameters, and variables in () leads to the declaration of far fewer constraints and allows for forethought in the mathematical expression, utilizing the notational conventions discussed in Sections 4.1–4.3.
The GAMS code expression for one such constraint is presented in Figure 5, the exact mathematical translation of which, prior to the introduction of the clear notation introduced in (), can be found in Appendix B. This constraint represents a set of operational reserve requirements. Specifically, it states that a certain amount of energy capacity be held in reserve either as “spinning reserve,” which can be accessed immediately if needed, or as “quick start,” which can be accessed quickly but with a delay to ramp up to a predetermined minimum load. Good practice also dictates that the order in which the constraints are defined correspond to the order in which they are declared in the code.

By contrast, using the indices, sets, parameters, and variables defined for () in Sections 4.1–4.3, we are able to express a comparable restriction on the model with fewer terms and in a fraction of the space as the following two constraints:
In (), we condense several of the distinct variables in the statement of the constraint present in the legacy version of ReEDS by (i) combining multiple variables into a single variable using the more efficient notation and (ii) linking the () variables for existing generation, transmission, and storage to respective variables for growth and change in these quantities through separately defined capacity constraints on generation, transmission, and storage.
There are approximately 200 distinctly declared constraints in the legacy version of ReEDS, and whereas some have simpler expressions than this example, there are instances that are more complicated. Utilizing the notation specified for () along with best practices for the organization of constraint sets in an optimization model, we have a much more concise statement of this model component, requiring only 20 distinctly declared constraints, many with far less intricacy in their expression in () than their equivalents in the previous version of ReEDS. The concise mathematical representation helps to inform the code, which we have written to be consistent with our recommendations. Specifically, we retain the “stem” that represents the name of the parameter or variable as in our mathematical formulation. The modeling language allows for the implementation of subscripts as unique indices inside parentheses, whereas the superscripts appear after the stem of the word, that is, following an underscore. The use of sets and the corresponding employment of indices allow us to minimize the number of similar variables and constraints we must explicitly write out. We use comments in the code to define the variables with their corresponding units.
5. Summary of Results
The transition from ReEDS to () has created a more concise and explicit mathematical documentation and led to a clearer model and programmatic structure. Without this clear view, results are more difficult to interpret, and alternative, more efficient formulations are harder to identify. Before this transition, a user was confronted with 100 pages of GAMS code that contained descriptive word notation choices for indices, sets, parameters, and variables in the model. By contrast, () presents an explicit mathematical formulation of the problem, including the use of best practices for model organization and notation for indices, sets, parameters, and variables. Whereas the mathematics of the model is implicit in the GAMS code, it is explicit in () and easily converted to code in any modeling language. Furthermore, this transparency in the mathematical expression of the model allows the user to easily identify structure, potentially leading to
Elimination of redundancies and subtle contradictions within the structure of the constraint sets.
Opportunities to detect algorithmic alternatives with which to solve the model.
Identification of tighter choices for “big-M” in the constraint sets.
The ability to eliminate numerical instability by precomputing model parameters and ensuring that machine zeros are treated as such.
More efficient specification of cuts when the model is reformulated with binary variables to model fixed costs.
Table 5 provides a summary of the reduction in the complexity of the expression of the model accomplished by reformulating the previous version of ReEDS mathematically as (). There is a significant enhancement in the expression of () as a result of carefully considering the mathematics; identifying efficient notation for indices, sets, parameters, and variables; and organizing the presentation of the model components. Recalling that the legacy version of the ReEDS model occupied more than 100 pages of inconsistently commented code in GAMS, the relative ease and interpretability of (), a version of which is detailed in its entirety in fewer than seven pages in Appendix C, is clear. The contrast serves as a useful example in the merits of using best practices for formulation and expression when approaching industrial-scale problems in mathematical optimization. The practitioner should make efforts to thoughtfully and efficiently define indices, sets, parameters, and variables such that the model is consistently organized and mathematical structures are visible to the user, redundancies can be eliminated, and cuts and algorithmic strategies aimed at finding solutions more efficiently can be identified.
|
Table 5. Reduction in Model Complexity from ReEDS to ()
| Model element | Previous version of ReEDS | () model |
|---|---|---|
| Distinctly named sets and associated indices | > 20 | 10 |
| Distinctly named parameters | 24 | |
| Distinctly named variables | 121 | 14 |
| Objective function terms | 139 | 6 |
| Constraint sets |
Not organized by variables or constraint type.
Organized by variables and constraint type.
Moreover, industrial-scale models require considerable resources to formulate and solve. Because of the large setup time necessary to construct a real-world optimization model, once built, it is often used in myriad manners, for example, by adding constraints and/or variables to consider different but closely related cases. To the extent possible, the original model should anticipate future modifications when making notational choices. In its totality, this reformulation exercise provides a tangible example for students in the mathematical programming courses offered at our university with opportunities to emphasize best practices in the formulation of industrial-scale optimization models that are much larger than those presented in standard texts.
Furthermore, now that the reformulation of the previous version of ReEDS as (.2) has been completed, users, both on the ReEDS team and those external to NREL, may exploit the more efficient representation of the model to perform additional analyses, such as (i) comparison of solution times for (.2) to those of the previous version of ReEDs using equivalent data though preprocessed for (.2); (ii) reformulation of the previous version of ReEDS to include binary variables to better model fixed costs associated with retirements of old infrastructure and construction of new infrastructure for generation, transmission, and storage of energy; (iii) identification of problem structures that lead to constructs such as cuts once the model is reformulated with binary variables; and (iv) introduction of new constraint sets to model implications of competing policies on the distribution mix for energy production and transmission, both regionally and nationally.
Finally, the example has been well-received in the courses into which it was introduced. Specifically, students have commented on (i) a strength of the course being the introduction of real-life examples with their implementation details, (ii) the presence of well-structured approaches to problem formulation and corresponding solution techniques, and (iii) an associated project that fosters cooperation between students and builds on optimization concepts in the context of both calculus III and future classes.
We recognize the contributions of all of the scientists and engineers at the National Renewable Energy Laboratory who played a role in developing ReEDS. In particular, we acknowledge David Bielen, Trieu Mai, Matthew Mowers, and Daniel Steinberg.
Appendix A. The First Four Terms of the ReEDS Objective Function
The mathematical expression of the first four terms of the ReEDS objective function, excluding the in-line comments, discussed in Section 4.4, is provided here. We have not explicitly defined all of the expressions in the objective. Rather, it is presented as an example of how complicated the mathematics expressed in the GAMS code for ReEDS is relative to the equivalent mathematics expressed in the reformulated model, ().
Appendix B. The ReEDS Operating Reserve Constraint
The mathematical expression of the ReEDS operating reserve constraint, discussed in Section 4.5, is provided here; we have not explicitly defined all of the expressions in this constraint. Rather, it is presented as an example of how complicated the mathematics expressed in the GAMS code for ReEDS is relative to the equivalent mathematics expressed in the reformulated model, ().
Appendix C. The Reformulated ReEDS Model ()
The mathematical expression of the essential elements of the ReEDS model reformulated as (), discussed throughout this paper, follows.
C.1. Sets
|
Table C.1. () Sets, Subsets, and Indexed Sets
| Set | Description |
|---|---|
| Generating technologies | |
| Aggregated sets of generating technologies | |
| Regions | |
| Model years | |
| Time slices within days in modeled years | |
| Ancillary services | |
| Transmission line types | |
| Seasons within modeled years | |
| Historical years | |
| Modeled years | |
| Daytime time slices (morning, afternoon, evening) | |
| Nighttime time slices | |
| Renewable supply curve technologies | |
| Generating technologies that are primary storage devices | |
| Wind generating technologies | |
| PV generating technologies | |
| Alternating current (AC) transmission lines | |
| Direct current (DC) transmission lines | |
| Years that have occurred prior to and including year t | |
| Years that a plant of technology i built in year t is operating | |
| Years that a plant of technology i built in the year t is expired | |
| Initial vintage classes for technology i | |
| Added vintage classes for technology i | |
| Vintage classes that are valid for technology i in year t | |
| Bins that are used to define the RSC for technology i in region r | |
| Technologies that belong in the aggregated technology type j | |
| Time slices that exist in the same day as h, excluding h | |
| Singleton time slice with the highest load during the day that includes h | |
| Pairs of regions for which valid transmission lines of type l exist |
|
Table C.2. () Model Parameters
| Parameters | Description |
|---|---|
| Discount factor applied to operational costs incurred in year t [present$/future$] | |
| Discount factor applied to investment costs incurred in year t [present$/future$] | |
| Variable operation and maintenance cost for technology i of vintage class v in region r in time slice h of year t [$/MW] | |
| Fuel cost for technology i in region r in time slice [$/MW] | |
| Ancillary service cost for technology i providing ancillary service product a in time slice h [$/MW] | |
| Fixed operation and maintenance cost for technology i of vintage class v in region r in year t [$/MW] | |
| Capital cost for investments in generating capacity from technology i in region r in year t (includes financing) [$/MW] | |
| Capital cost for spur-line transmission investments for generating capacity from technology i in region r in year t from RSC bin b (includes financing) [$/MW] | |
| Capital cost for bulk transmission line investments for line type l from region r to region in year t, accounting for distance between regions [$/MW] | |
| Prescribed maximum capacity level after planned retirements of generators installed prior to 2010 from technology i of “initial” vintage v in region r in all years through t [MW] | |
| Prescribed minimum capacity of generators installed beginning 2010 from technology aggregation category j in region r cumulative through year t [MW] | |
| Withdrawals/retirements of initial vintage capacity from technology i of vintage class v in region r in year t [MW] | |
| Prescribed transmission capacity level of type l between regions in year t, after planned retirements and additions of transmission installed prior to 2010 [MW] | |
| κirt | Average capacity value of technology i in region r in year t [%] |
| Capacity planning reserve margin requirement in region r in season s of year t [MW] | |
| Transmission loss rate between region , accounting for distance [%] | |
| Quantity of capacity from technology i available in region r for RSC bin b [MW] | |
| αivrht | Availability for technology i of vintage class v in region r during time slice h in year t for either (i) dispatchable technologies or (ii) variable resource renewable technologies [%] |
| Average load in region r in time slice h in year t [MW] | |
| μirh | Minimum loading requirement for technology i in region r in time slice h [%] |
| ρai | Percent of the committed capacity of technology i that can provide ancillary service product a [%] |
| Fraction of load that contributes to the requirements of ancillary service product a [MW/MW] | |
| Fraction of wind generation that contributes to the requirements of ancillary service product a [MW/MW] | |
| Fraction of PV capacity during daytime hours that contributes to the requirements of ancillary service product a [MW/MW] |
|
Table C.3. () Model Variables
| Variables | Description |
|---|---|
| Givrht | Average generation by technology i of vintage class v in region r in time slice h of year t [MW] |
| Saivrht | Ancillary service type a provided by technology i of vintage class v in region r during time slice h in year t [MW] |
| Kivrt | Generating capacity for technology i of vintage class v in region r in year t [MW] |
| Nivrt | Investment in capacity for technology i of vintage class v in region r in year t [MW] |
| Investment in refurbished capacity for technology i of vintage class v in region r in year t [MW] | |
| Investment in capacity for technology i of vintage class v in region r in year t from RSC bin b [MW] | |
| Investment in transmission capacity of line type l between regions in year t [MW] | |
| Investments in technology aggregation category j in region r in year t that exceeds the prescribed build requirement [MW] | |
| Transmission capacity online type l between regions in year t [MW] | |
| Flow of firm capacity from region r to region in season s of year t [MW] | |
| Average flow of energy along transmission line type l from region r to region in time slice h in year t [MW] | |
| Average charging of storage technology i of vintage class v in region r in time slice h in year t [MW] | |
| Average discharging of storage technology i of vintage class v in region r in time slice h in year t [MW] | |
| Average flow of ancillary service product a from region r to region in time slice h in year t [MW] |
C.2. Parameters
C.3. Variables
C.4. Objective: Minimize Operational and Investment Cost
C.5. Subject to the Following Constraints
Initial generating capacity for modeled years cannot exceed initial values, and capacity inventories are monotonically nonincreasing with respect to time:
Added generating capacity during the modeled years cannot exceed previously existing capacity plus aggregated investments in new and refurbished capacity, and capacity inventories are monotonically nonincreasing with respect to time:
Investments in new construction and refurbishments must equal prescribed minimum capacity plus overinvestment, and overinvestments are zero in the historical years:
RSC technology investments are differentiated by technology and region:
Cumulative refurbishments cannot exceed the sum of expired investments from added vintages and prescribed withdrawals (retirements) from initial vintages’ capacity:
Available transmission capacity equals the sum of prescribed initial values and transmission investments:
Flow of firm capacity cannot exceed transmission capacity:
Total firm capacity plus net imports of capacity must be greater than or equal to planning reserve margin requirements:
Total investments in RSC technologies cannot exceed maximum RSC bin size limits:
Average generation plus provision of ancillary services cannot exceed the available capacity:
The sum of total generation, net imports, and net storage discharge must equal the required load:
Generation in any time slice must be greater than or equal to a fraction of the generation from this plant during any time slices within the same modeled day (i.e., a plant cannot run at full loading and zero within the same day):
The ability to supply ancillary services cannot exceed a given fraction of the “committed” capacity, estimated as the average output from the plant during the time slice with the highest load during the day:
The total supply of ancillary service products plus net imports of that service must be greater than or equal to the reserve requirement:
The flow of energy plus the flow of all ancillary services cannot exceed capacity for the given line type:
All variables are also assumed to be continuous and nonnegative:
References
- (2006) Nonlinear Programming: Theory and Algorithms, 3rd ed. (John Wiley and Sons, Hoboken, NJ).Crossref, Google Scholar
- (2017) Modeling the value of integrated U.S. and Canadian power sector expansion. Electricity J. 30(2):47–59.Crossref, Google Scholar
- (2004) Top ten secrets to success with optimization. Phalanx 37(4):12–13, 25–26.Google Scholar
- (2007) Formulating integer linear programs: A rogues’ gallery. INFORMS Trans. Ed. 7(2):153–159.Link, Google Scholar
- (2008) Optimization tradecraft: Hard-won insights from real-world decision support. Interfaces 38(5):356–366.Link, Google Scholar
- (2017) 2017 standard scenarios report: A U.S. electricity sector outlook. Report NREL/TP-6A20–68548, National Renewable Energy Laboratory, Golden, CO.Google Scholar
- (2016) A view to the future of natural gas and electricity: An integrated modeling approach. Energy Economics 60:486–496.Crossref, Google Scholar
- (1991) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).Google Scholar
- (2016) Regional energy deployment system model documentation: Version 2016. Report NREL/TP-6A20-67067, National Renewable Energy Laboratory, Golden, CO.Google Scholar
- (1970) How to write mathematics. L’Enseignement Mathematique 16(2):123–152.Google Scholar
- (2016) Modeling the integrated expansion of the Canadian and U.S. power sectors. Electricity J. 29(1):71–80.Crossref, Google Scholar
- (2013) Practical guidelines for solving difficult linear programs. Surveys Oper. Res. Management Sci. 18(1–2):1–17.Crossref, Google Scholar
- (2014) Renewable electricity futures for the United States. IEEE Trans. Sustainable Energy 5(2):372–378.Crossref, Google Scholar
National Renewable Energy Laboratory (2019) Now publicly available, NREL’s ReEDS model expands access to inform the power sector, Accessed December 18, 2022, https://www.nrel.gov/news/program/2019/now-publicly-available-nrels-reeds-model-expands-access-to-inform-the-power-sector.html.Google Scholar- (2013) A survey of linear and mixed-integer optimization tutorials. INFORMS Trans. Ed. 14(1):26–38.Link, Google Scholar
- (2001) Teaching modeling in management science. INFORMS Trans. Ed. 1(2):62–67.Link, Google Scholar
- (2017) Optimization in Operations Research, 2nd ed. (Pearson Higher Education, Hoboken, NJ).Google Scholar
- (2017) Assessing the impact of nuclear retirements on the U.S. power sector. Electricity J. 30(9):14–21.Crossref, Google Scholar
- (2014) Greenhouse gas mitigation options in the U.S. electric sector: A ReEDS analysis. Energy J. 35(SI1):101–114.Crossref, Google Scholar
- (2016) Consistent notation for presenting complex optimization models in technical writing. Surveys Oper. Res. Management Sci. 21(1):1–17.Crossref, Google Scholar
- (2016) Long-term implications of sustained wind power growth in the United States: Potential benefits and secondary impacts. Appl. Energy 179(1):146–158.Crossref, Google Scholar
- (2017) Assessing the costs and benefits of US renewable portfolio standards. Environ. Res. Lett. 12:094023.Crossref, Google Scholar

