November 23, 2021 in Artificial Intelligence
How data fabrics are disrupting the analytics world
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2022.01.03
When properly implemented, data fabrics are a key technology area in artificial intelligence (AI), an ideal means of preparing data for machine learning, and an irreplaceable component in modern data integration methods that deliver better, faster and less expensive analytics than almost any other approach. The closer we can bring the common-sense business terminology and questions as the underlying fabric of data in an IT ecosystem, the more agility the enterprise builds to establish trust with highly connected and context-rich data to assist customers along every step of their journey.
Data fabric is an umbrella term for multiple data management capabilities that effectively stitch together all data for unified access, regardless of its physical format or location. These logical fabrics include tools for everything from data cataloging and data quality to data governance and more. Semantic knowledge graphs are the foundation and connective tissue of the data fabric framework on which contemporary data integration is based.
With them, data fabrics provide highly contextualized relationship understanding between individual nodes and entire datasets to enable more nuanced and relevant analytics results than other approaches involving graph databases and relational techniques. In fact, the analytical might of data fabrics with semantic knowledge graphs is incomparable to that of either labeled property graph databases or data warehouses supported by relational technologies. Labeled property graphs, which were primarily designed for storage, often leave the challenge of competing definitions as an exercise for the developer.
On the other hand, relational data warehouses were designed for structured data and are difficult to use and reuse with the exorbitant amounts of structured and unstructured data deluging the enterprise. Data modeling is a time-consuming chore with these repositories, which frequently has to be redone each time new sources are added or business requirements change, leaving a scattered and growing set of data marts and custom repositories littering the data landscape.
Data fabrics, however, excel in several areas to provide heightened analytics prowess. They support complex discovery style analytics, intelligent inferences for machine readable data, and data virtualization capabilities that eliminate silos and reduce data transformation costs so organizations can connect and query data throughout their ecosystems. Subsequently, there are numerous real-world, compelling use cases where top firms abandoned property graph and relational databases for the advanced analytics might of data fabrics, proving they’re the future for cheaper, better, faster analytics.
Achieving Analytical Superiority
It’s a mismatch to compare the analytical capabilities of correctly implemented data fabrics to those of labeled property graph databases. The former’s graphs were expressly designed for managing data; the latter were created to simply store data. This basic fact provides a world of difference between these two options for preparing data for analytics.
As the names suggests, data fabrics with knowledge graphs deliver a knowledge representation of enterprise concepts (in business-friendly terms) reinforced by universal standards of the Resource Description Framework (RDF). This quality ensures data fabrics’ data is readily read and understood by machines. These fabrics can make smart inferences about the knowledge they contain, which property graphs can’t. Consequently, data fabrics can detect more relationships and inter-connections between nodes than property graphs can. Plus, they use standardized data models that naturally evolve to include new business requirements and sources. This characteristic gives data fabrics an agility and ease of use that neither nonsemantic graph databases nor relational ones can match.
The disparity between data fabrics and relational methods is even greater than that between them and property graphs. Because relational technologies were designed for structured data, they only offer limited utility on semi-structured and unstructured data, which comprises the largest amount of data organizations are dealing with. Although relational technologies can work on semi-structured and unstructured data, they don’t do so natively (as data fabrics do) and require a significant amount of effort and time to value. More importantly, relational databases or warehouses lack the innate relationship understanding of data fabrics that give vital context about inter-data connections for queries impacting the quality of downstream analytics.
Finally, relational options are as inflexible as data fabrics are adaptive; any changes in sources or business needs require substantial time to redo data modeling efforts. The rigid status quo leaves businesses with a counting number of one-off database systems, the inability to find reuse at any level, and hampers the level of analytics they can use and trust.
The greater relationship awareness and context provided by data fabrics are the main reasons sophisticated analytics vendors are deserting relational and property graph databases for them. This fact is exemplified by the trend in which analytics service providers – who sell analytics across verticals – are adopting data fabric technologies. The analytical complexity of their daily tasks necessitates these fabrics, which provide more cost efficient, quicker and better results than other options, particularly when considered at scale.
Speeding Discovery Analytics & Addressing Analytical Complexity
The data fabric characteristics mentioned above make these tools the perfect means of discovering the unknown in analytics. Analytics itself encompasses a broad spectrum that begins with basic capabilities including traditional, historic business intelligence aptitudes, such as reporting, descriptive and diagnostic analytics applications. For example, organizations can use these approaches to determine how many widgets they sold last month in a particular region. The key consideration about these basic analytics is users already know the answers they want to get – they just need the computational mechanisms to provide them.
However, with advanced analytics that include a surplus of predictive, prescriptive, and optimization techniques and more, users don’t know the answers to the questions they’re asking. A good example of this is a query such as: Who are my high value customers that are most likely to buy product X? This use case doesn’t just involve number crunching, but entails segmenting customers, predicting behaviors and analyzing product appeal. Prescriptive analytics – in which users get an answer to the above question then analyze what to do next to maximize sales of that product to that customer base – is the most advanced form of complex analytics.
These complex, discovery-style analytics are critical for enterprise personas such as data scientists, analysts, data engineers and architects to perform their jobs well. Some of the most distinctive aspects of complex analytics are they deliver multidomain insights. For example, data scientists can analyze data across any domain including customer, product and supply chain to devise solutions for strategic outsourcing of product manufacturing. With complex discovery-style analytics, these users can perform analytics across data sources, data structure variations, business units and organizations to include external sources.
Whereas basic analytics only explain what happened, advanced analytics do so while explaining why and for whom. By finding unanticipated correlations between buyer behavior – like the relationship between new fathers buying diapers and beer, for example – organizations can monetize discovery style analytics with better merchandising and marketing campaigns. These complex analytics exploit unknown relationships in data found by data fabrics that elude other approaches, which is why analytics service providers are utilizing them for discovery analytics.
Discovering Intelligent Inferences
Data fabrics enhanced by semantic graphs maximize the worth of enterprise knowledge with smart inferences. For instance, if Shelly is a customer and customers are a type of product user, data fabrics can infer Shelly uses that product. This trivial example illustrates the capacity for them to create additional enterprise knowledge from existent knowledge without human intervention. These inferences are part of the machine intelligence data fabrics support and are crucial for expanding enterprise knowledge in any given domain. The result is a highly contextualized understanding of enterprise data for enhanced query responses.
As a result, inference engines in data fabrics can make suppositions about facts, business concepts, and data models at runtime to redress differences in data model terminology for holistic insight into data assets. Intelligent inferences are a form of AI reasoning that deliver full explainability of results. Inference engines reveal the logic – in business terms – for specific query responses in a tree-like structure. Thus, organizations can see how answers were derived for explainable AI, which isn’t supported by relational or graph databases. These inferences can further enhance and augment other areas of machine learning, with their results in turn fed back into knowledge graphs for leading edge-machine learning architectures.
The Importance of Data Virtualization
Additionally, knowledge graphs are central to the data fabric architecture that employs data virtualization, which provides a virtualization layer of virtual graphs of different data sources. This layer allows organizations to connect data – across sources – for comprehensive insight from unified queries without significantly moving data. The decrease in moving data that data fabrics provide is a huge driver for reducing costs. This is because users aren’t endlessly copying and moving data for each query, a process they’d otherwise have to do. The virtualization layer of data fabrics provides a faster time to value with increased insights and analytics to process, exploit and disseminate those data connections throughout the enterprise.
In addition to these cost advantages, there’s also a faster time to value by connecting to data sources for queries with data fabrics. With traditional relational approaches, organizations must purchase, maintain and operate expensive data pipelines for constantly replicating and transforming data for horizontal analytics. Ultimately, the premier advantage of data virtualization is the elimination of data silos that defy data governance, escalate risk and keep organizations from discovering the full value of their data, which they can readily uncover with data fabrics powered by data virtualization.
The Blueprint for Analytics
Data fabrics are far better for analytics than other approaches involving relational and property graph databases. They excel at discovery-style advanced analytics, making intelligent inferences to exploit enterprise knowledge, and eradicating data silos with data virtualization. These capabilities open up high ROI opportunities, where existing assets in legacy systems, warehouses or even data lakes can use the fabric to deliver these analytics faster than ever before.
Data fabrics directly support the most modern means of integrating data and provide nuanced query results that are vital requisites for maximizing the utility of advanced analytics offerings. As a result, they’re the blueprint for better, cheaper, faster analytics on all enterprise data.
Al Baker is vice president of Enterprise Solutions at Stardog, an enterprise knowledge graph platform provider., where he is responsible for the company’s solution architecture and technical delivery of knowledge-graph solutions to its customers. He has extensive experience in data, graph and analytics that includes technical leadership positions at Booz Allen Hamilton, Avaya and Bell Labs, with 20 U.S. Patents during his tenure. For more information visit www.stardog.com or follow them @StardogHQ.