March 21, 2023 in Big Data
Big Data to Big Decisions: Techniques to Build Data Backbone and Decisioning Systems for High-Growth Organizations
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2023.02.06
The primary purpose of data in the business context is to help drive decisions that create a competitive advantage in the marketplace. Yet organizations repeatedly look at data from a single lens – technology. This leads them to keep failing at extracting value out of their data. Although data and analytics are underpinned in technology, the harmonization of technology with people and processes leads to successful data-driven decision-making. This is a complex process, especially in a high-growth environment in which existing capabilities need to scale while new ones are being built to replace them. This is compounded by the fact that many of the high-growth organizations today have vast troves of data, and big data especially accentuates technology challenges.
However, this is not an insurmountable challenge. With the combination of a robust data backbone and an organizational decisioning system, decision-making capabilities can be built successfully. At the highest level, having a robust data backbone means having scalable systems and processes in place such that they enable the capture, flow, processing and dissemination of accurate and complete data with as little friction as possible. A decisioning system is a combination of people, tools and processes that enables everyone to make data-driven decisions. In this article, we will explore the key techniques for building these processes and systems.
Data Backbone
A data backbone needs to provide the organization with a robust infrastructure and a way of working with data. There are three main levers to building a robust data backbone.
- Creating a platform that enables “data as a product.” The concept of productizing data has been discussed for some years now. The Achilles’ heel is typically the lack of infrastructure that enables the whole organization, both the data producers and consumers, to work in this archetype. But there are some critical constructs that can speed up this journey when executed well:
- Creating and providing primitives that get used by every team for data processing and storage. This ensures stack consistency across different teams and short-circuits interoperability challenges.
- The primitives should enable support for diverse workloads and data types. This helps support varied needs in the organization, for example, machine learning, business analytics, natural language processing and image processing.
- These systems also need to have capabilities and tools to enable data producers to publish data into a centralized data framework. This framework should provide the requisite publishing mechanisms, such as application programming interfaces (APIs), and support capabilities for the users. This is critical to the success of “data as a product” efforts. We will go into why this is important later.
- The infrastructure should also enable data contracts and service-level agreements (SLAs) between teams, including the ability to form and maintain contracts and monitor those contracts for violations. In high-growth organizations, multiple teams are trying to solve problems independently without being slowed down. In such an environment, it is critical to have norms around which team provides what data, how and when it provides that data, and the recourse mechanisms for when that does not happen.
- Developing capabilities to produce and maintain high-quality data. The most common failure mode in being able to use data for decisions is a lack of reliable data. This is a broad topic that entails data governance and processes, and because of the vastness of this topic, most companies fail at data governance execution. However, a prioritized approach can help make this journey much better with an initial focus on three aspects:
- Schema enforcement and governance. The data infrastructure must be equipped with a mechanism that facilitates the enforcement of schemas. Further, the schemas invariably evolve over time, and systems should be able to track them and alert the governance owners. Schemas are a bedrock of high-quality data, and incongruent ones are the root cause for not being able to join and use different data sets together to do powerful analysis.
- Data producer independence with publishing core data as a first-class construct. This allows each team to optimize its operations and use primitives in a way that makes sense for them. They can have their own data model and architecture. But they must publish the core models and schemas into a centralized directory that points to the group repository. This solves for both agility and coherence. The teams have independence but also access to data from different teams.
- Data lineage and quality monitoring. Most analytics teams spend a lot of time identifying why the data is incorrect. More often than not, it is because of some logic or data transition error between two systems. This makes data lineage an immensely powerful tool for fault identification. Similarly, quality monitoring is an essential aspect of governance. Being able to set accuracy, completeness and business logic rules on data is critical. These capabilities can help deliver early alerts to the right data owners and prevent wrong data from reaching end users. We have come a long way in terms of tools available to achieve these goals, yet most companies do not have these capabilities today because of a lack of prioritization.
- Democratize data use. This is a hybrid topic with aspects of technology, organization and processes. For most companies, the following four-step journey can help set up robust foundations.
- Create a single source of truth. Although many companies have data lakes that make data available to everyone, it is not uncommon to find dozens of data tables with similar data. To compound the issue, there is no indication of which table is the most reliable and what the definitions of contained data are. This challenge increases with big data because data lakes support more open publishing. Regardless of a data lake or data warehouse, the company needs to maintain a set of golden data tables for the core data sets.
- Maintain a robust data dictionary for the golden data tables. The dictionary should contain the metric definitions, creation logic and data owners. This makes it easy for anyone in the organization to understand how to correctly use and interpret the data and who to reach with any questions.
- Make it easy to find the “right” data. Data discoverability is a much-overlooked topic. It is rare to find an easy-to-use tool in organizations that one can search for all available data, understand which data fits their purpose, and the mechanics available to access and use the data in one’s own workflows. But there are several tools in the market today and startups working in this area, and it is something that can be solved rather quickly.
- Infrastructure to enable reliable access to single sources of truth. The access should ideally be about the ability to use the data well, not just extracting the data. To enable access, the best practice is to have the business intelligence (BI) stack running on source data because it removes data duplication. Further, using open formats and API access allows teams to use tools and engines as they see fit for their use cases.
All of these steps set up functional analytics teams for success. They can create end-user dashboards and enable decision-making.
Decisioning System
The decisioning system complements the data backbone and is the organizational aspect to ensure success. Elements such as organizational culture, operating cadences, leadership models, organization structure and clear charters make the system efficient. There are two elements that can fundamentally shift the success path and are most important to focus on to create a data-driven decisioning organization.
- Organizational semantic layer to bring business and technology together. Quite often, the ineffectiveness of the operating model between business and data teams leads to the inability of organizations to make decisions based on data. The operating model breakdown can usually be traced back to a few reasons:
- Lack of communication between data and business teams.
- Business team’s lack of time and understanding of how to effectively leverage data teams.
- Data team’s limited domain understanding.
- Lack of contextualization of data, i.e., interpreting the analysis in the broader business and strategic context.
These challenges can be solved by utilizing an organizational semantic layer, which is composed of business-oriented individuals who are data savvy. These individuals are part of the data team and perform the role of a product manager with a few key differences.
-
- They are not just gathering requirements from internal customers; they are working with them to codesign the solution.
- They play the role of a proxy customer and speed up the development cycle, e.g., by helping test and taking the product to 90% completion without having the internal customer teams do the testing. This solves a critical problem of the internal customer teams having day jobs and lacking time to engage with and support the data teams in development.
- Most importantly, they do a critical job of contextualization. A lot of the work done by the data team is not building a product but conducting an analysis or creating a tool that helps surface insights. Hence, just acting like a traditional product manager does not yield results. The semantic layer also connects dots; applies the lens of business strategy, competitive dynamics, organization dynamics and prioritization of business questions; and then shapes the requirements and the work to be done. Oftentimes, this is very different from just gathering requirements from internal customers and developing the product. This layer acts as the glue that brings together business, technology and data analytics.
In general, folks with strong business backgrounds and experience in strategy, operations, management consulting, etc., who have become very skilled in data are great candidates to build this semantic layer.
- Bidirectional processes to create a culture of continuous improvement. It may be true that data teams need to take the onus of making sure that their work is well understood and used in the organization; it is not totally a one-way street. Executives and leadership should create operating cadences and processes in which the business and functional teams can engage with data teams and work with them effectively. Some of the most effective ways to do so include:
- A dashboard culture in the organization, in which the central dashboards created by data teams are used in every executive review to understand the metrics and help make decisions.
- Creation of an operating model in which there are dedicated rituals that enable a two-way flow of information – for example, design sessions in which the business and functional teams explain to the data teams their priorities, market considerations, decision-making mechanisms and requirements from their executive leaders. Conversely, data teams explain the technological possibilities and challenges, development cycle and other constructs that help the business teams understand how to better leverage data teams.
- Externships, which are a great way to help develop this connective tissue in the organization – for example, a data analyst doing a short stint as a product marketing manager or an account executive doing a stint as a data analyst.
- The business and data teams that work together presenting retrospectives, just as in agile sprints, to understand how to improve their operational model and each other’s effectiveness.
Building a robust data backbone and decisioning system is an iterative and multiyear journey. But one does not need to wait too long. A company can pick a few needle-moving use cases and build the end-to-end system for those use cases, e.g., revenue forecasting or churn prediction. This starts to deliver business value and sets a template for scaling up. But this journey can be very rewarding because the company can then truly make use of its data to drive decisions quicker and create a competitive advantage.
Shreshth Sharma is a technology strategy and operations executive specializing in human-machine teaming and data-driven decision-making. He has 15 years of experience across management consulting, technology and media industries in leading firms such as BCG, Sony Pictures and Twilio. Currently, he is senior director of strategy and operations at Twilio and leads Twilio’s Enterprise Data team.