June 24, 2024 in O.R. Toolbox

We Need to Modernize How We Introduce Students to Optimization

Warren B. Powell

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/orms.2024.02.08

In the article, “Smarter Decisions for a Better World: Telling Our StORy” [1], INFORMS then-President Laura Albert celebrated the importance of using operations research (O.R.) to make better decisions, highlighting the INFORMS tagline: “Smarter decisions for a better world.” As I have often repeated:

“If you want to run a better system, you have to make better decisions.” [2]

The science of making the best possible decisions is known as optimization, a field long associated with O.R. But while O.R. can claim many successes in cracking complex problems in optimization, O.R. tools are not in widespread use. In fact, although making decisions is a universal human activity, very few have even heard of O.R. tools and methods (in sharp contrast, for example, with the recent emergence of machine learning tools such as ChatGPT). In my experience, even our own students rarely use some of our best-known tools.

In this article, I am going to take on two titans of operations research: George Dantzig and Richard Bellman. Those in the O.R. community will immediately recognize Dantzig’s contribution of the simplex algorithm for solving linear programs and Bellman’s contribution of “Bellman’s equation” for solving sequential decision problems that he called “dynamic programs.” However, very few people need to actually solve a linear program, and even fewer use Bellman’s equation to solve a sequential decision problem!

George Dantzig headshot Richard Bellman headshot

George Dantzig (left) and Richard Bellman (right). Source: Stanford University and IEEE History Center, respectively.

Most of my own career has been spent standing on the shoulders of these two giants. I worked extensively on complex resource allocation problems arising in transportation and logistics, in which linear programming was a fundamental tool. In addition, all the problems consisted of decisions being made over time, which means they were a form of dynamic programming. (This work produced a book on approximate dynamic programming [3], which served as the foundation for a growing startup, Optimal Dynamics [4].)

Still, consider the many fields in which people make decisions: transportation and logistics, supply chains, manufacturing, energy, public health, medicine, finance, economics, the physical and social sciences, psychology, e-commerce and so on. Now go into any company or organization in any one of these fields where people are making decisions on a day-to-day basis, and ask how many of them have ever heard of a linear program or dynamic program (I guarantee it will be close to zero).

Next, ask the undergraduates who majored in O.R. courses and transitioned to any of these fields if they have ever solved a linear or dynamic program at work – I am pretty sure that the vast majority of even our own students will say no.

This does not mean that there are no applications of linear and dynamic programming – academic journals are full of papers using these tools for a wide range of problems. In addition, there are real applications of these tools in at least some of these fields, but only for very specialized decisions.

Yet, these organizations are full of people making decisions, and they presumably all aspire to make the best decisions they can. If they are not using the best-known tools, how are they making decisions? And is the field of operations research teaching anything that helps these people make their decisions?

The O.R. Toolbox

O.R. is a field that develops tools to solve problems. We are very proud of our tools, whether it is linear programming and the simplex algorithm or our latest algorithm for solving some specialized problem. Some of us then go on to teach these tools to students in courses with names like “Linear Optimization,” “Integer Programming,” “Nonlinear Programming” and “Dynamic Programming.”

These courses are the proverbial hammer looking for a nail. We teach techniques and then assume that the students will find settings in which to apply them. Where we are failing is that we are not teaching students how to model and solve the decision problems that they actually encounter in real-world settings. We provide tools but not a toolbox. And although hammers are familiar to everyone and widely used, linear programming and dynamic programming are highly specialized tools familiar to only a few people and used by even fewer.

Area	Decision	Information
Health	Which drug to use	Patient response
	Who to vaccinate	Number of cases
	What dosage	Patient response
	Approve a drug in a clinical trial?	Market success
	Scheduling nurses	Schedule acceptance
Energy systems	Where/how many wind farms	Energy generated, prices
	Location/size of battery storage	Grid prices
	How much energy to produce or store	Grid prices
	Which generators to use	Grid congestion
E-commerce	Which product to advertise	Sales
	How much to bid for ad placement	Ad placement
	Which web design to use	Ad clicks
Finance	Pricing a European option	Exercise price
	Buying/selling an asset	Purchase/selling price
	Optimizing portfolios	Prices/correlations
Supply chain mgmt.	Where to locate warehouses	Transportation flows/costs
	Which supplier to use	Cost, product yield
	Inventory planning	Inventory, sales
	What price to charge	Sales, profit
	Marketing, advertising, sales incentives	Sales
Trucking	Load pricing	Loads tendered
	Load acceptance	Cost of coverage
	Driver assignment	Did driver accept?
	Where to hire drivers	Number of drivers hired
Laboratory sciences	Which catalyst, chemical, molecule	Nanotube length
	Concentrations	Material strength
	Temperature	Porosity
	Time	Drug delivery

Table 1. Different decisions, along with some information that might arrive after the decision is made that can affect the performance of the decision.

Table 1 lists a sample of application settings, along with examples of decisions that have to be made within each setting. Some decisions (driver assignment, nurse scheduling, warehouse location, portfolio management) draw on the tools of linear, nonlinear and integer programming, whereas the rest of the decisions do not. In addition, not a single decision is static or deterministic – they are all made over time as new information arrives that affects the performance of the system.

What all the decisions in Table 1 have in common is that they are all sequential decision problems. That is, they are all described by the sequence: decision, information, decision, information …

After each decision is made, the system incurs a cost or contribution (or some other performance metric). The goal remains to make the best possible decision, but we need to make decisions that work well over time, usually (but not always) averaging across all the different outcomes of the information arriving to the system.

We are not talking about a small class of decision problems, as was the case with the classical optimization problems of linear, integer and nonlinear programming. Sequential decision problems represent a massive class that arises everywhere. This includes decisions that everyone makes on a day-to-day basis, as well as problems that arise throughout business, science, engineering, economics, politics, etc., effectively any process that involves people.

Now consider what is easily the pride and joy of the O.R. community: the linear program, which Dantzig conquered with his simplex algorithm. Whereas the simplex algorithm is (properly) viewed as a major breakthrough, what is often overlooked is the value of formulating a linear program that is universally written as:

$\displaystyle\min_{x}c^Tx,$ (1)

subject to

Ax = b, (2)

x ≤ u, (3)

x ≥ 0. (4)

Now consider the problem of modeling sequential decision problems. The most natural approach is to simply write out the problem, which we might write as:

state, decision, information, state, decision, information …,

where the state variable (that captures what we know or believe before we make a decision) evolves according to a transition function that takes as input the state, decision and information variables and outputs the updated state variable. Transition functions can be simple (such as updating the inventory) or many thousands of lines of code.

deterministic assignment problem of truck routing — Figure 1. Deterministic assignment problem for assigning resources (truck drivers) to tasks (loads to be moved).

Challenging Bellman

Decisions are determined using a method that goes by many names, but the most popular is “policy,” which uses the information in the state variable to determine what decision is being made. Policies can be as simple as “if the inventory is below s, order up to S,” but policies come in different styles. There are entire research communities dedicated to different types of policies.

So how do we design policies? At this point, I can hear the O.R. community say: “Oh! Oh! I know! It’s a … dynamic program! We can solve it with Bellman’s equation!” – which might be written as:

$V_t\left(S_t\right)=\displaystyle\max_{x_t}{\left(C\left(S_t,x_t\right)+EV_{t+1}\left(S_{t+1}\right)\middle|S_t,x_t\right)}.$ (5)

The standard approach to computing $v_t(s_t)$ solves Equation (5) by stepping backward in time. If the state is a vector, then we encounter the well-known “curse of dimensionality” that is routinely cited as the reason why we cannot use Bellman’s equation. So we introduce Bellman’s equation and then explain why we cannot use it. This line of thinking can be found in thousands of papers.

We typically assume that the decision (or “action”) is scalar, such as what turn to take on a transportation network, the choice of medication for a patient or the next move in a game. But take another look at the applications of linear programming in Table 1, such as assigning truck drivers, scheduling nurses or planning which generators to use. Each of these problems are linear or integer programs in which the decision may have thousands, even hundreds of thousands of dimensions. Figure 1 illustrates a classic assignment problem that was an active topic of research many years ago but is still celebrated today [5].

Almost all assignment problems (I would also include most linear, nonlinear and integer programs) are actually approximations of sequential decision problems. Figure 2 illustrates the problem of assigning drivers to loads over time. We might solve these assignment problems every day, every hour or every minute. The solution of one problem impacts the next problem, which also depends on random information (such as new loads) that we don’t know in advance. In other words, these sequential assignment problems are examples of very high-dimensional dynamic programs.

map of USA with truck driver assignment loads — Figure 2. Assigning drivers to loads over time.

The challenge of solving Bellman’s equation has led to an extensive literature under names like approximate dynamic programming and reinforcement learning. These are advanced methods that require considerable sophistication, which means this material is traditionally limited to Ph.D. courses, with algorithms illustrated by what are often toy or highly stylized problems. Not surprisingly, it is almost impossible to find people who are actively using these methods in practice – they exist, but you have to know where to look.

Challenging Dantzig

We diligently teach our students about linear programming and the simplex method, which is used for a tiny number of static, deterministic, vector-valued decision problems that arise in a handful of (admittedly important) applications. When linear programs do arise in practice, we always use a package, which means we do not need to know the simplex algorithm. (Honestly, when are we going to stop teaching the simplex algorithm?)

On the other hand, we restrict the teaching of dynamic programming and Bellman’s equation to Ph.D. students, even though sequential decision problems (every sequential decision problem is a dynamic program) arise everywhere.

How do we solve sequential decision problems if we can’t use Bellman’s equation? The answer is to do what everyone doing deterministic optimization does: Model first, then solve.

“Model first” means we start by writing the objective function:

$\displaystyle\max_{\pi=(f,\theta)}\mathbb{E}({\sum}_{t=0}^TC(S_t,X^\pi(S_t))\mid S_o),$ (6)

given the transition function and the information process. Instead of searching for x (or x_t in a time-dependent problem), we want to find an effective policy $X_t^\pi\left(S_t\right)$ , which comes from four (meta)classes:

Policy function approximations (PFAs) – Analytical functions that use the information in the state variable to determine the decision. Examples include order-up-to inventory policies, linear models and neural networks.
Cost function approximations (CFAs) – Deterministic optimization models that are parameterized to produce better results over time under uncertainty.
Policies based on value function approximations (VFAs) – These use Bellman’s equation, typically with approximate value functions.
Direct lookahead approximations (DLAs) – Policies that optimize into the future to make a decision now, which we divide into two subclasses:
1. Deterministic DLAs – Deterministic approximations that plan into the future.
2. Stochastic DLAs – These explicitly model uncertainty when we plan into the future.

Note that CFAs, VFAs and DLAs all involve an imbedded optimization problem, which may be as simple as a sort or as complex as a high-dimensional integer program.

These four classes of policies are universal – they cover every method presented in the research literature or used in practice, including whatever is being used right now. The academic community tends to focus on VFAs (which depend on Bellman’s equation) or stochastic DLAs (such as decision trees or stochastic programming). However, by far the most popular classes in practice are PFAs, CFAs and deterministic DLAs, all of which typically involve the use of tunable parameters θ.

If we write our parameterized policy as $X^\pi(S_t\mid\theta)$ , the tuning problem would be written:

$\displaystyle\max_{\theta}\mathbb{E}({\sum}_{t=0}^TC(S_t,X^\pi(S_t\mid\theta))\mid S_0).$ (7)

The objective function (6) (or (7)) along with the transition function and information process can be used to model any sequential decision problem. Note that whereas deterministic optimization focuses on finding , sequential decision problems require finding effective policies $X^\pi(S_t\mid\theta)$ .

This way of thinking about methods for decision-making represents a fundamentally new approach for sequential decision problems. In addition, it changes how we approach familiar static problems, such as linear, nonlinear and integer programs, which are often policies (a form of CFA) for solving sequential decision problems (see [6] for a brief introduction to cost function approximations).

I have sketched an entirely new style for teaching introduction to optimization [7] that draws on a teach-by-example book that I wrote for an undergraduate course at Princeton University [8]. Supporting this work is an advanced graduate-level book [9] written for students whose primary interest is developing models and algorithms for a wide range of applications.

Additional information on sequential decision problems is available, including books, videos, instructional webpages and suggestions for how to teach this material [10].

References & Notes

Laura Albert, 2023, “Smarter Decisions for a Better World: Telling Our StORy,” OR/MS Today, September 5, https://pubsonline.informs.org/do/10.1287/orms.2023.03.15/full/.
Warren Powell, “A Universal Framework for Sequential Decision Problems: The next generation of AI,” https://tinyurl.com/sdafieldyoutube/. This is a version of a talk I have given 50 times since 2020.
Warren B. Powell, 2022, “Approximate Dynamic Programming: Solving the curses of dimensionality,” New York: Wiley.
Optimal Dynamics, https://optimaldynamics.com/.
Optym University, “Linear and Integer Programming,” https://www.youtube.com/watch?v=QGVpnHezymA.
Cost function approximations: https://tinyurl.com/cfapolicy/.
Warren Powell, 2024, “A Modern Approach to Teaching Optimization,” Boston, MA: NOW Publishing, https://tinyurl.com/TeachingOpt/.
Warren B. Powell, 2022, “Sequential Decision Analytics and Modeling, Boston, MA: NOW Publishing, https://tinyurl.com/sdamodeling/ (available as a free download).
Warren B. Powell, 2022, “Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions,” New York: Wiley, https://tinyurl.com/RLandSO/.
Resources webpage on sequential decision analytics: https://tinyurl.com/sdalinks/.

Warren B. Powell

Warren B. Powell is professor emeritus, Princeton University, and chief analytics officer, Optimal Dynamics.

Keywords:

We Need to Modernize How We Introduce Students to Optimization

The O.R. Toolbox

Challenging Bellman

Challenging Dantzig

References & Notes

Login to your account

Login or Register

Member Login

Nonmember Login