October 27, 2022 in Viewpoint

The MLOps Stack is Missing a Layer

The current technology stack is aimed at streamlining the logistics of machine learning processes but misses the importance of model quality.

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/LYTX.2022.06.03

Machine learning (ML) increasingly drives business-critical applications such as credit decisions, hiring, drug discovery, predictive maintenance and supply chain management. As ML enters new domains, the underlying technology needs are also rapidly evolving. The ability to transition models developed by data scientists into production-grade systems – in an efficient and scalable manner – has emerged as a fundamental necessity.

Productionizing ML models is complex, and MLOps (machine learning operations) practitioners have rightly looked to software engineering processes as a blueprint, seeking to adapt traditional development and IT operations (DevOps) thinking to the particular needs of ML. This has led to the emergence of an explosion of tools to streamline and automate various parts of the ML life cycle. Collectively, this vibrant, evolving and sometimes confusing set of tools constitutes the MLOps technology stack.

Making Sense of the Evolving MLOps Stack

MLOps first started off with the problem of taking models from notebooks into production at scale, but it has now become a collection of tools that help with things that data scientists generally don’t like to do or worry about. A wide variety of companies, tools and open-source projects have emerged in areas such as data labeling, model and data versioning, experiment tracking, scaled-out training and model execution.

One way to organize the seemingly disparate set of tools is to ask two questions: (1) What problem is being solved? (2) Where in the life cycle of a model is it being solved? Figure 1 maps the various tools in the market along these dimensions: the stage of the life cycle (data, training or postdeployment) and whether the tool is part of the compute layer or the system of record layer.

typical MLOps technology stack
Figure 1. Typical MLOps technology stack today.

The Missing MLOps Layer: Model Quality

Unfortunately, for most organizations, the “influx of innovation” in MLOps has not yet translated into large-scale, real-life adoption of ML. A key reason has been that most MLOps tools so far have focused almost exclusively on the logistics of operations, creating automated workflows to streamline the process of moving ML models from development to robust production environments.

This type of process automation is necessary but not sufficient. It increases throughput and repeatability of the ML life cycle but does little to ensure model quality. If we draw an analogy to software development, MLOps today is where software engineering was a few decades ago: very few tools for testing, debugging and meaningful monitoring.

This is the missing quality layer in the MLOps stack. Without it, data scientists build ad hoc tests, spending significant amounts of time trying to figure out what really happened when a model goes wrong at any part of the life cycle, and end up using trial-and-error methods to mitigate identified issues. Organizations can end up with highly automated “assembly lines” that retrain and productionize ML models at pace but with little sense of the quality of those models and their ability to survive messy, real-life situations during their lifetime.

None of this will come as a surprise to the ML community. And yet, the strategy for building high-quality models is often limited to hope and prayer. Why is that?

  • First, industry thinking on the definition of ML quality – and how to test/monitor/debug for it – is still in its early stages. Traditionally, the focus has been on the predictive accuracy of models using test/out of trend data. More recently, other symptoms of poor model quality, such as unjust bias, have received interest from regulators and practitioners. However, as we have argued in a separate blog, a holistic approach is needed to cover the full set of factors that contribute to ML quality, including transparency, justifiability/conceptual soundness, stability, reliability, security, privacy and, of course, the underlying data quality.
  • Second, compared with traditional software, ML systems are characterized by greater uncertainty because models make predictions based on patterns learned from potentially incorrect, incomplete or unrepresentative real-world data. The ML life cycle also involves more iterations and greater interdependencies between stages; for example, the detection of poor performance in one segment may necessitate incremental data sourcing and/or resampling further back in the cycle.

Not surprisingly, it has been easier to focus MLOps efforts to date on the simpler problem of automating the mechanical process steps involved in building and deploying a model. There are good reasons for this: Well-established precedents are in place from the world of traditional software engineering, and outcomes are easily measurable – e.g., time taken to move from one stage in the life cycle to the next.

What Does a More Effective MLOps Stack Look Like?

A more complete and effective version of the MLOps tech stack would have a “system of intelligence” layer at the top, dedicated to ensuring ML quality throughout the life cycle (see Figure 2).

ML quality throughout lifecycle
Figure 2. Assuring ML quality throughout the life cycle.

Organizations that build with ML quality-first approach can achieve greater predictability in their efforts to productionize models that are effective, reliable and stable. Data scientists can follow a more streamlined development workflow for building, evaluating and refining models. Teams with operational responsibility for the models in production will have the tools to keep an eye on changes in the model’s behavior and accuracy over time, which are both underpinned by holistic ML quality tests, extensive root cause analysis and rapid debugging capabilities.

To enable large-scale positive impact in real-life use cases, MLOps must focus as much on model quality as it does on process efficiency. The more serious the financial and human cost of incorrect predictions – e.g., if the outputs from an ML model can result in denying someone a job, insurance coverage, a mortgage or social security benefits – the higher the bar is for the quality of that model. The good news is that the MLOps stack is evolving to meet this challenge, and solutions that fit this quality layer are being increasingly deployed.

Shameek Kundu
Shayak Sen

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.