June 24, 2024 in O.R. Toolbox

We Need to Modernize How We Introduce Students to Optimization

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/orms.2024.02.08

Photo provided by Jeff Linderoth.

In the article, “Smarter Decisions for a Better World: Telling Our StORy” [1], INFORMS then-President Laura Albert celebrated the importance of using operations research (O.R.) to make better decisions, highlighting the INFORMS tagline: “Smarter decisions for a better world.” As I have often repeated:

If you want to run a better system, you have to make better decisions.” [2]

The science of making the best possible decisions is known as optimization, a field long associated with O.R. But while O.R. can claim many successes in cracking complex problems in optimization, O.R. tools are not in widespread use. In fact, although making decisions is a universal human activity, very few have even heard of O.R. tools and methods (in sharp contrast, for example, with the recent emergence of machine learning tools such as ChatGPT). In my experience, even our own students rarely use some of our best-known tools.

In this article, I am going to take on two titans of operations research: George  Dantzig and Richard Bellman. Those in the O.R. community will immediately recognize Dantzig’s contribution of the simplex algorithm for solving linear programs and Bellman’s contribution of “Bellman’s equation” for solving sequential decision problems that he called “dynamic programs.” However, very few people need to actually solve a linear program, and even fewer use Bellman’s equation to solve a sequential decision problem!

George Dantzig headshotRichard Bellman headshot

George Dantzig (left) and Richard Bellman (right). Source: Stanford University and IEEE History Center, respectively. 

Most of my own career has been spent standing on the shoulders of these two giants. I worked extensively on complex resource allocation problems arising in transportation and logistics, in which linear programming was a fundamental tool. In addition, all the problems consisted of decisions being made over time, which means they were a form of dynamic programming. (This work produced a book on approximate dynamic programming [3], which served as the foundation for a growing startup, Optimal Dynamics [4].)

Still, consider the many fields in which people make decisions: transportation and logistics, supply chains, manufacturing, energy, public health, medicine, finance, economics, the physical and social sciences, psychology, e-commerce and so on. Now go into any company or organization in any one of these fields where people are making decisions on a day-to-day basis, and ask how many of them have ever heard of a linear program or dynamic program (I guarantee it will be close to zero).

Next, ask the undergraduates who majored in O.R. courses and transitioned to any of these fields if they have ever solved a linear or dynamic program at work – I am pretty sure that the vast majority of even our own students will say no.

This does not mean that there are no applications of linear and dynamic programming – academic journals are full of papers using these tools for a wide range of problems. In addition, there are real applications of these tools in at least some of these fields, but only for very specialized decisions.

Yet, these organizations are full of people making decisions, and they presumably all aspire to make the best decisions they can. If they are not using the best-known tools, how are they making decisions? And is the field of operations research teaching anything that helps these people make their decisions?

The O.R. Toolbox

O.R. is a field that develops tools to solve problems. We are very proud of our tools, whether it is linear programming and the simplex algorithm or our latest algorithm for solving some specialized problem. Some of us then go on to teach these tools to students in courses with names like “Linear Optimization,” “Integer Programming,” “Nonlinear Programming” and “Dynamic Programming.” 

These courses are the proverbial hammer looking for a nail. We teach techniques and then assume that the students will find settings in which to apply them. Where we are failing is that we are not teaching students how to model and solve the decision problems that they actually encounter in real-world settings. We provide tools but not a toolbox. And although hammers are familiar to everyone and widely used, linear programming and dynamic programming are highly specialized tools familiar to only a few people and used by even fewer.

Area

Decision

Information

Health

Which drug to use

Patient response

 

Who to vaccinate

Number of cases

 

What dosage

Patient response

 

Approve a drug in a clinical trial?

Market success

 

Scheduling nurses

Schedule acceptance

Energy systems

Where/how many wind farms

Energy generated, prices

 

Location/size of battery storage

Grid prices

 

How much energy to produce or store

Grid prices

 

Which generators to use

Grid congestion

E-commerce

Which product to advertise

Sales

 

How much to bid for ad placement

Ad placement

 

Which web design to use

Ad clicks

Finance

Pricing a European option

Exercise price

 

Buying/selling an asset

Purchase/selling price

 

Optimizing portfolios

Prices/correlations

Supply chain mgmt.

Where to locate warehouses

Transportation flows/costs

 

Which supplier to use

Cost, product yield

 

Inventory planning

Inventory, sales

 

What price to charge

Sales, profit

 

Marketing, advertising, sales incentives

Sales

Trucking

Load pricing

Loads tendered

 

Load acceptance

Cost of coverage

 

Driver assignment

Did driver accept?

 

Where to hire drivers

Number of drivers hired

Laboratory sciences

Which catalyst, chemical, molecule

Nanotube length

 

Concentrations

Material strength

 

Temperature

Porosity

 

Time

Drug delivery

Table 1. Different decisions, along with some information that might arrive after the decision is made that can affect the performance of the decision.

Table 1 lists a sample of application settings, along with examples of decisions that have to be made within each setting. Some decisions (driver assignment, nurse scheduling, warehouse location, portfolio management) draw on the tools of linear, nonlinear and integer programming, whereas the rest of the decisions do not. In addition, not a single decision is static or deterministic – they are all made over time as new information arrives that affects the performance of the system.

What all the decisions in Table 1 have in common is that they are all sequential decision problems. That is, they are all described by the sequence: decision, information, decision, information …

After each decision is made, the system incurs a cost or contribution (or some other performance metric). The goal remains to make the best possible decision, but we need to make decisions that work well over time, usually (but not always) averaging across all the different outcomes of the information arriving to the system.

We are not talking about a small class of decision problems, as was the case with the classical optimization problems of linear, integer and nonlinear programming. Sequential decision problems represent a massive class that arises everywhere. This includes decisions that everyone makes on a day-to-day basis, as well as problems that arise throughout business, science, engineering, economics, politics, etc., effectively any process that involves people.

Now consider what is easily the pride and joy of the O.R. community: the linear program, which Dantzig conquered with his simplex algorithm. Whereas the simplex algorithm is (properly) viewed as a major breakthrough, what is often overlooked is the value of formulating a linear program that is universally written as:

  (1)

subject to

Ax = b, (2)

x ≤ u, (3)

x ≥ 0. (4)

Now consider the problem of modeling sequential decision problems. The most natural approach is to simply write out the problem, which we might write as: 

state, decision, information, state, decision, information …,

where the state variable (that captures what we know or believe before we make a decision) evolves according to a transition function that takes as input the state, decision and information variables and outputs the updated state variable. Transition functions can be simple (such as updating the inventory) or many thousands of lines of code.

deterministic assignment problem of truck routing
Figure 1. Deterministic assignment problem for assigning resources (truck drivers) to tasks (loads to be moved).

Challenging Bellman

Decisions are determined using a method that goes by many names, but the most popular is “policy,” which uses the information in the state variable to determine what decision is being made. Policies can be as simple as “if the inventory is below s, order up to S,” but policies come in different styles. There are entire research communities dedicated to different types of policies.

So how do we design policies? At this point, I can hear the O.R. community say: “Oh! Oh! I know! It’s a … dynamic program! We can solve it with Bellman’s equation!” – which might be written as:

  (5)

The standard approach to computing solves Equation (5) by stepping backward in time. If the state is a vector, then we encounter the well-known “curse of dimensionality” that is routinely cited as the reason why we cannot use Bellman’s equation. So we introduce Bellman’s equation and then explain why we cannot use it. This line of thinking can be found in thousands of papers.

We typically assume that the decision (or “action”) is scalar, such as what turn to take on a transportation network, the choice of medication for a patient or the next move in a game. But take another look at the applications of linear programming in Table 1, such as assigning truck drivers, scheduling nurses or planning which generators to use. Each of these problems are linear or integer programs in which the decision may have thousands, even hundreds of thousands of dimensions. Figure 1 illustrates a classic assignment problem that was an active topic of research many years ago but is still celebrated today [5].

Almost all assignment problems (I would also include most linear, nonlinear and integer programs) are actually approximations of sequential decision problems. Figure 2 illustrates the problem of assigning drivers to loads over time. We might solve these assignment problems every day, every hour or every minute. The solution of one problem impacts the next problem, which also depends on random information (such as new loads) that we don’t know in advance. In other words, these sequential assignment problems are examples of very high-dimensional dynamic programs.

map of USA with truck driver assignment loads
Figure 2. Assigning drivers to loads over time.

The challenge of solving Bellman’s equation has led to an extensive literature under names like approximate dynamic programming and reinforcement learning. These are advanced methods that require considerable sophistication, which means this material is traditionally limited to Ph.D. courses, with algorithms illustrated by what are often toy or highly stylized problems. Not surprisingly, it is almost impossible to find people who are actively using these methods in practice – they exist, but you have to know where to look.

Challenging Dantzig

We diligently teach our students about linear programming and the simplex method, which is used for a tiny number of static, deterministic, vector-valued decision problems that arise in a handful of (admittedly important) applications. When linear programs do arise in practice, we always use a package, which means we do not need to know the simplex algorithm. (Honestly, when are we going to stop teaching the simplex algorithm?)

On the other hand, we restrict the teaching of dynamic programming and Bellman’s equation to Ph.D. students, even though sequential decision problems (every sequential decision problem is a dynamic program) arise everywhere.

How do we solve sequential decision problems if we can’t use Bellman’s equation? The answer is to do what everyone doing deterministic optimization does: Model first, then solve.

“Model first” means we start by writing the objective function:

  (6)

given the transition function and the information process. Instead of searching for x (or xt in a time-dependent problem), we want to find an effective policy , which comes from four (meta)classes:

  1. Policy function approximations (PFAs) – Analytical functions that use the information in the state variable to determine the decision. Examples include order-up-to inventory policies, linear models and neural networks.
  2. Cost function approximations (CFAs) – Deterministic optimization models that are parameterized to produce better results over time under uncertainty.
  3. Policies based on value function approximations (VFAs) – These use Bellman’s equation, typically with approximate value functions.
  4. Direct lookahead approximations (DLAs) – Policies that optimize into the future to make a decision now, which we divide into two subclasses:
    1. Deterministic DLAs – Deterministic approximations that plan into the future.
    2. Stochastic DLAs – These explicitly model uncertainty when we plan into the future.

Note that CFAs, VFAs and DLAs all involve an imbedded optimization problem, which may be as simple as a sort or as complex as a high-dimensional integer program.

These four classes of policies are universal – they cover every method presented in the research literature or used in practice, including whatever is being used right now. The academic community tends to focus on VFAs (which depend on Bellman’s equation) or stochastic DLAs (such as decision trees or stochastic programming). However, by far the most popular classes in practice are PFAs, CFAs and deterministic DLAs, all of which typically involve the use of tunable parameters θ.

If we write our parameterized policy as , the tuning problem would be written:

  (7)

The objective function (6) (or (7)) along with the transition function and information process can be used to model any sequential decision problem. Note that whereas deterministic optimization focuses on finding , sequential decision problems require finding effective policies .

This way of thinking about methods for decision-making represents a fundamentally new approach for sequential decision problems. In addition, it changes how we approach familiar static problems, such as linear, nonlinear and integer programs, which are often policies (a form of CFA) for solving sequential decision problems (see [6] for a brief introduction to cost function approximations).

I have sketched an entirely new style for teaching introduction to optimization [7] that draws on a teach-by-example book that I wrote for an undergraduate course at Princeton University [8]. Supporting this work is an advanced graduate-level book [9] written for students whose primary interest is developing models and algorithms for a wide range of applications.

Additional information on sequential decision problems is available, including books, videos, instructional webpages and suggestions for how to teach this material [10].

References & Notes

  1. Laura Albert, 2023, “Smarter Decisions for a Better World: Telling Our StORy,” OR/MS Today, September 5, https://pubsonline.informs.org/do/10.1287/orms.2023.03.15/full/
  2. Warren Powell, “A Universal Framework for Sequential Decision Problems: The next generation of AI,” https://tinyurl.com/sdafieldyoutube/. This is a version of a talk I have given 50 times since 2020.
  3. Warren B. Powell, 2022, “Approximate Dynamic Programming: Solving the curses of dimensionality,” New York: Wiley.
  4. Optimal Dynamics, https://optimaldynamics.com/.
  5. Optym University, “Linear and Integer Programming,” https://www.youtube.com/watch?v=QGVpnHezymA.
  6. Cost function approximations: https://tinyurl.com/cfapolicy/.
  7. Warren Powell, 2024, “A Modern Approach to Teaching Optimization,” Boston, MA: NOW Publishing, https://tinyurl.com/TeachingOpt/.
  8. Warren B. Powell, 2022, “Sequential Decision Analytics and Modeling, Boston, MA: NOW Publishing, https://tinyurl.com/sdamodeling/ (available as a free download).
  9. Warren B. Powell, 2022, “Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions,” New York: Wiley, https://tinyurl.com/RLandSO/.
  10. Resources webpage on sequential decision analytics: https://tinyurl.com/sdalinks/.

Warren B. Powell

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.