April 7, 2023 in Principles for Successful Analytics Projects

Why Data Science Projects Fail: Part 1

What’s the problem (that you’re trying to solve)?

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/LYTX.2023.02.05

“If you try to tell someone else how to do their job better using sophisticated mathematics and computers without thoroughly understanding how they do their job today, including all of the problems and challenges they encounter, then you sir/madam are a fraud.” 

R.E.D. “Gene” Woolsey, Ph.D., Professor, Colorado School of Mines
Operations research academic, practitioner and consultant

Businesspeople and data scientists, individually and collectively, not understanding the real business problem at hand is the no. 1 reason data science projects fail, in my experience. Most often, a data scientist will collect data and build a model to only, at best, come up with the right answer to the wrong problem – i.e., a problem or question that the customer did not convey. Communication is a big issue that we will talk about later and that is part of the challenge here, but there are several foundational steps that a data scientist must take before engaging on a project to help ensure that the real business problem that is being addressed is mutually and thoroughly understood.

First and foremost, data science fundamentally requires a high degree of intellectual curiosity to be done well. You cannot be a data scientist “at arm’s length.” You will need to take a deep dive into and “get dirty” with the details of the company’s industry and business. To be effective, a data scientist requires deep contextual understanding at three levels:

  • Industry and segment.
  • Corporation and department.
  • Domain problem space.

Data science applications vary greatly across industries and their respective segments from energy (oil and gas, electric, wind, solar, generation, transmission) to transportation (airlines, railroads, trucking, rental cars), healthcare (providers, insurers, device manufacturers, pharmaceuticals), financial services (banking, credit cards, credit reporting, mutual funds, hedge funds, private equity, venture capital), manufacturing (automobiles, steel, consumer packaged goods, semiconductors, food) and retail (big box, hardware, clothing, housewares). Each of these industries have their own unique economics, operating models and competitive landscapes. It necessarily behooves the data scientist to research and understand as much as possible about the industry in which one is working.

Each corporation within a given industry or segment has its own competitive and economic DNA (e.g., low-cost provider versus premium high-margin provider), culture and mode of operation. A data scientist must learn and understand the following:

  • What is the company’s business model?
  • What is the company’s strategic competitive advantage?
  • What is the company’s core product and/or service offering(s)?
  • How does the company make (or lose) money (e.g., order-to-cash cycle)?
  • What are the primary sources of sales, revenue and cost (operating and capital expenditure)?
  • What makes the company “tick”?

The company’s annual report and financial statements are a great source of in-depth, detailed information to learn about the above topics. (If you don’t have a BBA/MBA/CPA, then find a friend in accounting or finance to help you get started!)

Inside your company, many, many different departments may be using data science (or none, depending on the company’s data and analytical maturity). The approach to data science and the problems to be solved are as varied as the department:

  • Marketing
  • Sales
  • Manufacturing
  • Operations
  • Finance
  • Accounting
  • HR

You will need to understand the goals, objectives, business processes, metrics, operating plans and road maps of the organization with which you are working to apply data science. You need to know how the work gets done, including budgets, data, data systems and software. You literally need to learn to speak their language – and, yes, each department will have their own vocabulary, terminology and acronyms (corporate America loves acronyms). Being a data scientist requires deep immersion in your industry, your company and your department to understand the domain problem space and be able to contribute materially – your goal is not to appear to be the “math geek with the fancy laptop” but rather to be a “team member that digs deep and helps solve problems using some really powerful, specialized skills.”

When I started working for the American Airlines Operations Research Department in 1987, fresh out of graduate school at Georgia Tech, all I knew about airlines was making a flight reservation, getting a boarding pass, finding my seat, ordering a drink and claiming my luggage. Over the subsequent six years, I learned about all of the relevant facets of airport operations, airline operations, maintenance/inventory operations and crew/flight academy operations. Whenever I had a new project, I physically parked myself in the problem area next to the people that did the actual work – i.e., the air traffic control tower and radar approach control center, network operations control center, maintenance hangar and office building – and I didn’t leave until I understood how they did their job and the current problem that we were trying to solve. Then, and only then, did I commence with the data science modeling work.

There are several questions data scientists need answers to before beginning a project:

  • What is the problem that we are trying to solve, clearly and succinctly stated?
  • What is the key business question we are trying to answer?
  • What is the desired business outcome?
  • What is the end state of the model/system we build? How will it be utilized?
  • What is the “target” for improvement (e.g., cost reduction, conversion rate increase)?
  • What KPIs (key performance indicators) are relevant to measure economic impact?
  • What experiments can we run to measure the before-and-after effect of the model?

Multiple meetings and whiteboard sessions may be required to adequately answer these questions, but it will be time well spent for all parties involved. As the old software engineering adage goes, “An ounce of design is worth a pound of debugging.”

Gartner reported that “through 2022 only 20% of analytic insights will deliver business outcomes” [1]. Therefore, understanding the true business problem at hand will help your project make the 20% cut line!

Reference

  1. K. Troyanos, 2020, “Use Data to Answer Your Key Business Questions,” Harvard Business Review, February 24, https://hbr.org/2020/02/use-data-to-answer-your-key-business-questions.

Douglas A. Gray

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.