June 13, 2023 in Data Management

How Using Multiple Data Sources Can Unlock the Power of Network Analytics

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/LYTX.2023.03.04

Organizations today have more data at their fingertips than ever before, gathered in data sets from a variety of sources. According to the World Economic Forum, it is estimated that, globally, more than 460 exabytes of data will be created daily by 2025 (1 exabyte is equivalent to 1,000 terabytes). Every step of a customer’s journey produces data unique to the customer. This data can be collected and analyzed by organizations to provide a variety of benefits, including improving customer experience, refining a company’s marketing strategy, and helping an organization make informed decisions and create a unified customer profile. The data may include personal information such as gender, social security number or address; web application data such as IP addresses or device ID; product or application usage info; or sales and purchase history. Data can be sourced from and stored in different IT systems such as enterprise resource planning, customer relationship management, procurement or the manufacturing system. This has been a traditional approach for operational processes, but when companies want to make the data available for other purposes, including analysis, machine learning, enterprise reporting or fraud detection, it can be challenging, time-consuming and complex. When properly analyzed and sorted, businesses and organizations can use data from heterogeneous sources to drive meaningful insights, anticipate challenges and support proactive or reactive decision-making.

Business Context

The vast amount and variety of data available can create a challenge for businesses that may struggle to deal with data processing as well as maintaining data. With such a deluge of information (and more arriving every day), how can businesses efficiently manage data from multiple sources and unlock the hidden insights within the data to address the ever-growing demands of customers?

Although the specifics of each organization’s data and its potential uses vary, combining and analyzing data gathered from many sources can offer companies significant insights and allow leadership to proactively respond to potential challenges and risks. For example, a telecom company installing new optical fiber cable to improve or offer new services to its customers requires coordination between various teams and stakeholders, including engineers, vendors and legal and government authorities for approval. Through the planning, construction, testing and launch process, data and information are generated by various teams and in various formats. The data can then be analyzed, processed and reported to track the project’s milestones, address potential challenges and make timely decisions moving forward. Organizations can and should invest in analytics teams and solutions that can unite data from different sources, bring it to a common platform and provide useful analysis.

The Challenges of Multiple Data Sources

The primary challenge when working with diverse data sources is how to combine data from sources such as application program interfaces (APIs), flat files, application logs, device logs and various databases. Sources may store data using different formats, and data may be structured, semistructured or unstructured, adding complexity to the challenge of consuming, processing and utilizing the data meaningfully.

When it comes to data extraction, the process of pulling the source data from multiple sources can be quite time-consuming and complicated, especially when there is a high volume of data or different data storage methods or formats. For example, Oracle may store data in one format and Microsoft’s SQL Server in another. When making use of disparate data types, once the data is extracted, it needs to be transformed into a common format, and the destination systems need to be considered.

Data integrity and quality are also significant challenges when dealing with multiple data sources. A classic example of this issue is date formatting: some data may be stored in a sequencer using the DD/MM/YY format, whereas other sources may use MM/DD/YYYY or even YYYY/MM/DD. For effective analysis, data from these disparate sources must be combined in an effective way, and without proper validation, discrepancies such as differences in date formatting can impact the entire integration cycle.

When bringing data from diverse source systems into a unified system for analytics purposes, the volume of data to be stored and analyzed will be enormous and continuously growing. In fact, some sources may collect a near-constant stream of data. Organizations may find addressing this growth challenging, so it is vital to plan ahead and ensure that integration solutions are able to handle the current and expected future volume of data.

Of course, data is the backbone of all analytics processes. When working with multiple heterogeneous data sources, the goal is to unify the data into one system for analysis, and understanding the data quality and availability is key to harmonizing disparate data sources.

Avoiding Common Pitfalls

Even with the ubiquity of multiple data sources, there are several mistakes businesses make that can hamper data effectiveness – most of which relate to how the business itself approaches data analytics. With the sheer amount of data available, it may be tempting to think that more data is better, but this can lead to sending more data to the target than can be effectively processed and analyzed. On the other hand, bringing too little data to the target is not useful for analysis; bringing the appropriate amount of data for effective analysis is key.

Another frequent mistake organizations make is neglecting to plan for scalability. As the amount of data that companies gather and analyze and the number of source systems continues to grow, the complexity of unifying data from multiple sources increases. If platforms are underprovisioned or not designed for efficiency and scalability, businesses risk impacting the entire system, leading to untrustworthy data and limiting their analysis capabilities and the value that can be gained from their data sources.

The lack of a clear data management approach can also present challenges. Appropriately informed decisions cannot be made regarding the data and its potential applications without an understanding of how each system sources data, how the data is passed between systems and how each system operates. Similarly, if the structure of the underlying data source changes, it can have ripple effects on the upstream or downstream applications, which means each data source must be integrated within the larger workflow to ensure that it can be processed across systems.     

Finally, companies often limit themselves through a lack of communication and common expectations between the business and data analysis teams. Data and analytics teams strive to create meaningful and impactful analytics platforms, using data to tell a story. The possible analysis, however, relies on the quality of the data and clear goals, which requires alignment between the business and analytics teams.

Best Practices for Utilizing Multiple Data Sources

Understanding the data landscape is a foundational requirement for businesses seeking to combine data from a variety of different sources and leveraging that data to drive insights. It is vital that companies ensure a comprehensive knowledge of how each system is operated, how the data is sourced and, for downstream systems, what kind and quality of data can be expected with what frequency. To understand where and how to use their data, comprehending the data landscape is important.

Similarly, it’s essential to discern how data passes between systems, including data transformation and cleansing. Cleansing will remove unnecessary or meaningless data to improve the data set’s consistency, and transformation will aid in the handling of data. Data heterogeneity should also be monitored to ensure that when data is passed between systems, the structure is retained. Understanding how data passes between systems and through these processes helps ensure data quality – whether it is accurate and without redundancies once it has been brought to the target system.

Given the amount of data that companies work with, it’s important to regularly check data system scalability. Although new cloud-based technologies are making scaling and storage easier, proper planning will ensure any required growth can occur with minimal effort and impact on existing systems. Ideally, the architecture, system design and analysis capabilities should be set up during the design phase to address future vertical and horizontal growth with agility.

Organizations also must be prepared to control potentially lengthy merging processes. Continuously running a merging process for several days before the data can be made available makes developing up-to-date insights based on the data challenging, and often is not what the business teams are looking for. Instead, organizations can use incremental data sets to reduce data volume.

More and more businesses are making use of the many different sources of data available to them to drive the decision-making process and gain meaningful insights. To live up to its potential, however, the ever-growing amount of data that organizations have access to needs to be processed, analyzed and combined into a single, unified system. For modern organizations, data analytics form the basis from which most decisions are made. With so many disparate sources providing data within an organization, it’s vital that the processes and practices for handling this data are properly designed and executed. As this space continues to evolve, businesses that understand the value and best practices of data analytics will be better positioned to leverage the benefits of their data.

Srikanthudu (Srikanth) Avancha
([email protected])

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.