April 7, 2008 in Oracle

Napoleon's Parable

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/LYTX.2008.02.13

The junior O.R. analyst was troubled and frustrated. “I just don’t see why this model isn’t tracking right,” he sputtered.“I’ve been over the code carefully, half a dozen times. I ran some test cases that came out right. But some of the data points we’re trying to forecast in the holdout sample are just way off. There don’t appear to be coding errors in the data, either – I’ve gone back and checked the bad points against source documents. Help, Carl! What more can I do?” 

“Well, let’s see, Bob,” the older analyst said soothingly. “You’re trying to model loan behavior, right? And your ‘bad’ data points are loans that did much better or worse than expected?”

Bob nodded.

“Then it’s probably not a mathematical error,” Carl told him. “More likely there’s some simplifying assumption, one you didn’t even realize you’d made, that is violated in these few cases. And are the bad points related in time, somehow?”

“I’m not sure,” Bob admitted. “We group them by cohorts, sets of loans that were made at the same time. The problems occur at different times since origination.”

“Aha,” Carl responded,“but that means they could be at the same time, couldn’t they?” Bob looked confused.

“I saw this in a military manpower model I worked on several years ago,” Carl explained. “We’d see these sets of similar misfits in different cohorts. But after a while we figured out that an error in month 24 since they joined for a group of people who joined in, say, May 1996, and an error in month 23 for a group that joined in June 1996, and so on, were really simultaneous. We’d go back and ask the sergeant major what had happened in that month, and it always turned out he’d say,‘Oh! That’s the month we announced such-and-such a change in the bonus program.’ What we’d seen as a complicated modeling issue really was an event we didn’t know about, affecting whole sets of people at the same time in the same way. I’ll bet your problems in your loan analyzer are also coming from events you didn’t know about, like changes in the loan programs that were offered.”

“I’ll check,” Bob affirmed, but he still looked dubious.

“This sort of thing is more common than you realize,” Carl added. “A few years ago, I got to hear a seminar by Dennis Cook, who’s famous for one of the best books on applied regression analysis. He said he’d discovered, after years of work, that what looked like a complicated model often turned out to be a mixture of subpopulations, each with a simple model. The real, underlying trouble was that some analyst had decided to consider the whole population as more or less alike, and didn’t realize there was this mixing going on.”

Now Bob was getting excited. “How come, with all the stuff that’s been written out there about statistical analysis, this sort of thing isn’t mentioned more often?” he asked.

“Oh, it is out there in the literature,”Carl said, “although admittedly not as often as the newest and fanciest new technical addons to help build those more complicated models. But people often don’t see that they’ve overlooked something simple – and it’s hard to admit to a client when you find out that’s what you’ve done! And then, of course, there’s the additional problem of analysis plans.”

Bob looked puzzled again. “What do analysis plans have to do with it?” he inquired.

“That’s where the hidden assumptions get set in concrete,” Carl replied. “Most clients and bosses want a plan of analysis first, before you’ve had any chance to see what the data really look like. So it’s easy to fall into the trap of planning, let’s say, linear regressions, and then find out the data violate some of the assumptions of the technique you intended to use. They’re not linear, or they have relationships you didn’t expect among variables that are supposed to be independent, or they’re discrete when you need them to be continuous. And try explaining that sort of thing to a client!

“Napoleon wrote, ‘No plan of battle ever survives the first contact with the enemy,’” Carl concluded. “From my experience I’d add: No plan of analysis ever survives the first contact with the data.”

“I won’t argue,” Bob agreed, “from my own experience. But what can you do?”

Carl suggested, “Write your plan to start with an exploratory stage, where you’ve proposed some techniques depending on what the data will support, and your first step is to determine whether the data are what you expected and whether they fit the assumptions of the technique you wanted to use. Your next step, as you can probably guess, usually is to come up with different ways to analyze, depending on what nasty surprises you got. Build in some steps of going back and verifying whether important events have been left out and need to be included – like the changes in bonus programs in the manpower model. And, of course, don’t forget the most important – and least often stated – assumption in any analytical modeling method.”

“What’s that?” Bob asked.

Carl smiled and said,“The assumption that the future will be like the past. If what happens next isn’t, in the ways that count, pretty much like what you saw before and analyzed, you have no chance of getting it right, no matter what analytical method you use.”

Doug Samuelson
([email protected])

SHARE:

Keywords:
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.