July 18, 2024 in Principles for Successful Analytics Projects
Why Data Science Projects Fail: Conclusion
What is foundational to success?
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2024.03.07
The goal of data science, and related fields like analytics, is to help solve complex strategic, tactical and operational problems; support and better enable data-based, model-driven decision-making; and answer key business questions in such a manner that business value is created and economic impact is maximized. Tom Davenport set the bar necessarily high when he said, “Models make the enterprise smarter; models embedded in systems and business processes make the enterprise more economically efficient.” For data scientists, this is our desired end state and the end in mind with which we begin our endeavors.
Like any scientific endeavor, experimentation and data collection and analysis are a part of the process, combined with the use of advanced mathematics and sophisticated software and computer technology. Notwithstanding all of this science and technology, the practice of data science takes place in the private sector (e.g., business, industry, research) and public sector (e.g., government, military, law enforcement), all of which are inhabited and operated by human beings.
Human involvement in data science is substantial in every step of the process and materially significant, requiring the development and application of many “soft skills” that necessarily facilitate successful execution and completion of data science projects.
Of all the soft skills required, I believe that communication is by far the most critical and foundational to successfully execute data science projects. Communication in all forms, specifically:
- Listening, to understand.
- Being heard, to be understood.
- Speaking and writing concisely, impactfully and with clarity for the target audience.
- Gaining a deep level of mutual understanding in all aspects of a project.
Communication is critical for all parties involved, especially the data scientist and business manager (“customer”), as well as data engineers, software engineers, business analysts, data governance analysts and potentially others. Many different individuals, with their respective unique skill sets, are often required to complete a data science project. Clear, concise communication and mutual understanding and agreement among all stakeholders on all critical facets of the project are essential for success. If there are n people on a project, the number of possible communication links between them are n(n-1), or on the order of n2 (O(n2)). It’s no wonder Brooks’ law states, “Adding resources to a project that is already late will only make it later” – communication alone is half the battle.
Project Challenges
Understanding the business problem that you are trying to solve is often a point in which data science projects go awry. Sometimes, the business folks themselves are not completely clear on what the real problem is. Therefore, we should not be surprised that the data scientist may need to do quite a bit of investigating, along with the business folks, to determine the problem to be solved. Sometimes the business folks do understand the problem, but there is a breakdown in communication – such as lack of a clear explanation from the business or failure to adequately listen and ask clarifying questions on the part of the data scientist – that inhibits mutual understanding of the problem. Getting to a clearly stated and mutually understood problem definition, as well as associated business process flows, data flows and decision-making processes, is foundational to initiating and successfully completing a data science project.
The challenges associated with data are many and will continue to hinder data science projects. Historically, challenges include not having enough data or not having it in one place for analysis and, going forward, having too much in too many forms and in too many locations. Great strides are being made in the fields of data engineering and data governance and the development of technology platforms that support these endeavors. The volume and dynamism of data generated by myriad enterprise systems, e-commerce and social media platforms, IoT devices, etc., will continue to generate more data than most enterprises can realistically, let alone easily, manage. The key for successful data science projects is to focus on the data that you must have for your project to get to MVP/M (minimum viable product or model). You can always add in relevant data when it becomes available down the road.
Misapplying a model often occurs when faulty or improper assumptions are made about the applicability of a particular model form or its usage to solve the problem at hand. Experimental design is a critically important skill that is often lost on citizen data scientists, and some professional data scientists, and is usually attributable to a lack of training and education in the subject. Although techniques can be quantitatively applied, there is also an artfulness to a well-designed, statistically valid experiment. Predictive model bias and overfitting are also common errors that result in invalidated results but can be avoided with properly applied techniques, e.g., k-fold cross validation. When in doubt, consult with a professor or more experienced colleague, and check your textbook and online references to ensure that the model you are employing is valid and the experiment you are running is suitable for the problem at hand. (Google is your friend because it is highly unlikely that you are the first person to encounter a given type of problem, or one that is similar. A thorough literature search is encouraged to model development.)
It should go without saying that business folks and data scientists should focus on problems and data science projects that represent an agreed-upon (high) business priority, i.e., ones that will realistically generate significant business value and economic impact, however you measure it. Unfortunately, that is not always the case. Sometimes business folks do not have a clear set of prioritized projects ranked on true business values. Even if they do, sometimes data scientists – and, yes, even business folks – get distracted by other initiatives that consume time and resources. The Key Business Question Grid and the Project Valuation Ranking tool in Part 4 can assist in focusing on the highest-priority, highest-potential problems and projects.
Managing Change
Data science projects induce inordinately large amounts of change. Data science fundamentally and even radically changes the way that problems are solved, questions are answered and decisions are made. In general, the transition to becoming a fact-based, data-driven enterprise is transformational and fraught with many dimensions of change, including moving away from gut instinct and Excel-based heuristics and rules of thumb to more rigorously rational model-based approaches to complex problem solving and decision-making. Data scientists may lead the way, but everyone must go on the journey together. Data scientists may inform and teach others how these advanced techniques and technologies function, but everyone, from analysts to managers to executives, must “buy in” to be successful in both individual projects and the overall transformation driven by data science methods and stakeholders.
Communication plays a critical role in managing change and “winning hearts and minds.” Storytelling using before and after comparisons including lots of data visualization to highlight the business impact generated by data science model-based solutions is crucial to demonstrating and proving to management the efficacy of these scientific approaches. (Most people love stories, with lots of pictures to help understand complex topics.) Opting in to augmentation-based AI methods and iterative interactive optimization approaches can ease the transition from the exclusively human- and Excel-centered approach to problem solving and decision-making to the compelling alternative founded on the greater analytical rigor offered by data science. Everyone in the stakeholder/constituent group must be convinced beyond a reasonable doubt that all of the (sometimes painful) changes driven by data science are made worthwhile by the business value and economic impact achieved. (“The juice is worth the squeeze!”)
No one likes disappointment, or the letdown that follows. Everyone wants a big win on their data science project to help the company and stakeholders and to further their own career progression. All the more reason to set realistic expectations on all of the relevant KPIs for the project – i.e., scope, timing, budget and business value targets. Leaning toward conservatism is the best approach and usually wins the day. Stretch goals are fine, as are BHAGs (big hairy audacious goals). Epic fails, caused by overpromising and underdelivering, could ruin a career (or could even get someone fired). Sandbagging draws skepticism and leaves the constituents with a lack of trust.
Project management (PM) has evolved to be considered both a science and an art. Techniques such as PERT/CPM and Agile burn-down charts attempt to quantify and measure how a project is progressing and how well a team is performing. These tools are invaluable to a project manager, but they primarily inform. There is a disproportionate amount of judgment that must be applied to managing the scientific modeling aspect of a data science project, as well as the systems development activity. Measuring and gauging complexity of a task that will impact resource consumption and timeline is a skill that comes from the experience of working on numerous projects with wide-ranging high and low levels of difficulty and learning how to approach and solve for them. Measuring and gauging a team’s output and productivity level as it rises and wanes over the course of a long project is a skill developed both through observation informed by data and through active interaction with team members as they climb learning curves and struggle to overcome a series of challenges with data, changes in scope, infrastructure issues and more. Project managers who know when to press or ease off and by how much, and when to challenge or relent, are a rare and skilled breed of professional that evolve from experience over time, not from PM certification courses alone.
As painful as it is to admit after spending years and years studying and learning all of that mathematics of far more than modest rigor, and learning to write and test code, no one other than you, your data science colleagues and your professors cares about the model, techniques or technology. People in business, especially the higher up the leadership chain they are exponentially, care more about the business value and top and bottom line economic impact of your data science than the math or code. They trust that you “did the math” but they don’t want to hear about it. Sorry.
My advice is to not let yourself get “wrapped around the axle” with a lot of nuanced, overly sophisticated mathematics for its own sake on corporate data science projects. Please, do yourself and your constituents and stakeholders a BIG favor, and save the math and code for the appendix of your presentation, for industry conferences and symposia, refereed academic journal publications, and your Data Science Center of Excellence and community of practice meetings. Always remember the Pareto principle (deliver 80% of the value for 20% of the effort), MVP/M, and that perfection is the enemy of done. The model and code need to be verified and validated, but not perfect.
The final and highest hurdle to achieving data science project success is getting your model from the sandbox (of your desktop or cloud-based work area) to a full-fledged production system (e.g., microservice or stand-alone) embedded in a high-value business process. Availability, reliability and repeatability are necessary for your model/system to achieve the “flywheel” of continuously ongoing business value creation without regular human intervention. This process/journey requires a team to realize the endgame – business people (i.e., executives for funding and political “air cover,” line managers to drive change, and individual contributors to help design, develop, test, validate and implement the solution), technology people (i.e., software, cloud, security, etc.), data people, test/QA people. It may take months, years or even a decade and may cost hundreds of thousands, millions or hundreds of millions of dollars to deploy and implement completely, depending on the scope and complexity of the problem and the level of sophistication and operational criticality of the model/system solution. (It is advisable to make sure that the benefits delivered by your solution are proportional to the costs to build and implement the same, by whatever measures and metrics the finance department/board of directors utilizes, e.g., NPV, IRR, ROI.)
As with any human endeavor involving teams of people, whether it is a co-ed softball team, a data science project or an expedition climbing Mount Everest, empathy is the most important quality to embody regardless of how incredibly difficult things get along the journey. And trust me, as worthwhile as data science projects are in every respect, things will get difficult at many, many points along the way, and you cannot afford to alienate any constituents, partners, teammates or stakeholders. People never forget heroes, and they never forget jerks. You may (barely) get through one project, but you will never get through another one by treating anyone who matters with anything less than The Golden Rule.
A few tenets of advice that go along with empathy when times get tough include:
- Assume positive intent.
- Give people the benefit of the doubt.
- Put yourself in the other person’s position.
- Trust, but verify, until people prove themselves unworthy of your trust.
- Delete the angry email before you hit SEND (better yet, don’t even write the email – go have a diplomatic conversation).
- Think, then breathe deeply, before you speak.
- Work the problem; don’t blame the person who created or uncovered it.
- Read “Emotional Intelligence” by Daniel Goleman for great advice on your EQ relative to your IQ, and the attributes of great senior executive leaders.
To engineer is human. Failure is feedback. Through failure, we learn to succeed.
I sincerely hope that this 10-part series on why data science projects fail will help you and your company to be more successful in all of your future data science endeavors.
Douglas A. Gray, MSOR, MBA, is a practitioner, leader and educator. He is currently director of data science at Walmart Global Tech and an adjunct professor of business analytics and data science at Southern Methodist University. Connect with him on LinkedIn at: https://www.linkedin.com/in/doug-gray-06bb4a4/.