January 17, 2024 in Data Science

Data Science: Connecting the Past and Pioneering the Future of Analytics

Zhiwei Zhu

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2024.01.05

We are fortunate to be living in an exhilarating era witnessing the birth and flourishing of an extraordinary scientific discipline: data science. Much like a dazzling celestial spectacle, data science has risen to prominence, captivating both academia and business with its profound interdisciplinary impact and limitless possibilities.

However, as we embrace this prodigious addition to the pantheon of analytical sciences, prevailing uncertainty and misconceptions abound. A panel discussion I attended at a large professional association’s annual conference, featuring two distinguished panelists leading analytics divisions in world-class pharmaceutical companies, reflected a not uncommon viewpoint. The panelists emphasized the need for data scientists to step out and make their programming skills more applicable to real problem-solving, equating data science to data analysis performed with computer programming with languages like Python and R.

As data science propels forward with the rapid unfolding of the artificial intelligence (AI)-driven Fourth Industrial Revolution, now is a moment ripe for historical reflection. Understanding the factors that distinguish data science from other analytics-intensive contenders is crucial for comprehending its ascent to the forefront of the scientific frontier.

Two Founding Members of the Analytics Family

At the core of analytics sciences are its venerable ancestors, the two eldest and most esteemed members of the data analysis family: mathematics and statistics. Over the centuries, these two siblings have flourished, nurtured by the inquisitive minds of philosophers, astronomers, economists and scientists from various fields, adapting and evolving to meet the diverse needs of humanity.

Mathematics, the profound science of abstract concepts, delves into the very essence of numbers, quantity, space and structure, revealing the logical supremacy that distinguishes human beings and provides the firm bedrock upon which data analysis proudly stands. Serving as the guiding light since the dawn of civilization, mathematics continues to be an essential companion in our pursuit of knowledge.

Accompanying mathematics on this extraordinary journey is its younger sibling, statistics. Statistics expands the influence of mathematics into the daily activities of humans by deftly incorporating uncertainty, quantified through probabilities, into the domain of certain theories and formulas known as equations or models – an area explored by mathematics but not exclusively owned. In the formative years of statistics, data was a precious and scarce commodity, and computational tasks demanded arduous labor. Yet, these early experiences ignited a burning dedication within statistics to innovate methodologies that could extract valuable insights (inferences) from limited data (sample) into the vast and unattainable terrain labeled as the population. The visionary efforts and achievements of statistics have gradually become its hallmark, adapting as the gold standard for drawing conclusions with some level of uncertainty in the ever-expanding world of science.

As we marvel at the captivating advancements in data science today, let us not forget the enduring contributions of mathematics and statistics. Their wisdom, ingenuity and adaptability continue to shape the landscape of sciences, enriching our understanding of the world and empowering us to make informed decisions in the face of complexity.

Other Siblings Within the Analytics Family

In the analytics family tree, another esteemed sibling stands out: actuarial science, one of the oldest and most established data-driven sciences. Its origins date back to the 17th century when visionaries including Blaise Pascal and Pierre de Fermat utilized probability theory to solve intricate problems related to gambling and life annuities within the insurance family. Maturing alongside the insurance family, actuarial science matured and established itself as a distinct profession with its own accreditation and professional institutions. As the insurance industry grew, actuaries became indispensable figures armed with specialized methods to estimate the likelihood of life-altering events and devise strategies to manage risk.

Although actuarial science was once considered a branch of statistics, its dealings with unique challenges within the insurance family, and the growing demand for expertise, defined its own identity. Tasks such as projecting human mortality decades into the future, assembling profitable business portfolios with varying types of risk products, and developing and implementing adequate regulations to protect both consumers and businesses became complex undertakings. These challenges involve analyzing not only historical data but also human behaviors, unpredictable risks and uncertainty of returns. Actuarial science proved its mettle, carving a distinct niche by adapting statistical analysis to optimize insurance business operations.

Members of the analytics family extend well beyond the field of insurance. Biometrics, a science born in the 19th century through the minds of Francis Galton and Alphonse Bertillon, amalgamated mathematical and statistical techniques with biological studies. It peers into the very fabric of human identity, using fingerprints, DNA and facial recognition for purposes ranging from identification to security.

Econometrics, a robust branch of statistical analysis, wields its prowess to test economic theories and forecast trends. Pioneers like Ragnar Frisch and Jan Tinbergen paved the way for the integration of mathematics into economic analysis, crafting models to describe and quantify the intricate workings of economic systems.

Epidemiology, the sentinel of public health, uses mathematical and statistical tools to study the distribution and determinants of diseases in populations. From John Snow’s legendary work on cholera to modern survival analysis and meta-analysis, epidemiologists have been at the forefront of safeguarding our well-being.

Operations research (O.R.), a master conductor orchestrating optimization in complex systems, was born in the crucible of World War II. O.R. was the brainchild of visionaries George Dantzig and John von Neumann, who employed mathematical techniques to improve military efficiency. Today, it embraces linear programming, dynamic programming and other mathematical tools to harmonize decisions, resources and logistics across myriad domains.

As the list extends for the family of analytics, one might notice a common characteristic: most family members are married to another influential application family. Their major devotions and achievements largely revolve around innovating problem-solving approaches in a niche field by adapting and customizing the data analytics methods and tools originated by mathematics and statistics. The primary focus of these branches is on applying rather than revolutionizing data analytics arsenals originated by mathematics and statistics.

Data Science: A Sibling or Trailblazer?

In this grand tapestry of scientific disciplines, data science stands unique not only because it draws strength from mathematics and statistics like other members of the analytics family but also because it integrates the best offerings from its cousin, computer science, and the emerging realm of the digital era, such as computing advancements and big data. Consequently, data science can leverage advantages and tackle challenges on a scale beyond what mathematics and statistics were initially designed to achieve. Moreover, big data and computing technology are not exclusive to any single science or business segment; they are essential and applicable across various fields. Therefore, compared with other analytics sciences, data science not only stands on the shoulders of giants but also possesses two unparalleled potentials:

Expanding foundational capabilities: Data science can expand foundational capabilities by harnessing the power of modern-era big data and computing technology that was unavailable during the infancy of mathematics and statistics. This includes the development and implementation of machine learning and artificial intelligence (AI) technologies, serving as the differentiator of data science from other data-driven sciences.
Harnessing big data and enabling AI applications: Data science can extend the benefits that mathematics and statistics have already brought to a diverse range of academic and business domains by employing its distinctive ability to harness big data and enable AI applications. This is the specialty or hallmark of data science.

In reality, data science has already ventured into these potentials and achieved remarkable successes. For instance, in the field of healthcare, predicting patient readmissions is of paramount importance for enhancing patient experience and reducing healthcare cost. With an expansive variety of patient data sources, traditional statistical models such as logistic regression may struggle to handle the intricate relationship between patient characteristics, medical history, and social and environmental factors influencing readmission risks. This is because their design is not explicitly suited for analyzing high-dimensional and complex data or handling forms of nonlinearity and data imbalance. Data science approaches, such as machine learning and deep learning, are better equipped to overcome the constraints of logistic regression and automatically uncover intricate patterns affecting readmission risk. These approaches are designed to handle and extract insights from big data.

In the retail industry, data science has contributed to transformative changes in the way companies comprehend consumer behavior and optimize their operations. Dynamic pricing, for instance, has undergone a revolution. Airlines once adhered to fixed pricing models, in which ticket costs were solely determined by flying distance and applied uniformly to all passengers. However, with advanced technology and abundant data accumulation, airlines began to perceive the potential of personalized pricing strategies based on individual passengers’ willingness to pay.

The age of uniform pricing gave way to meticulous data collection and analysis. Historical booking patterns, seasonal trends, passenger preferences, competitor fares and external variables like weather and local events were scrutinized. Business analysts, data scientists and pricing experts collaborated to craft models capturing the interplay between “willingness to pay” and various influencing factors. These insights translated into recommendations for price adjustments that accounted for factors such as booking lead time, flight popularity, historical trends and traveler browsing behaviors.

Innovative pricing strategies, informed by data-driven insights, struck a balance between competitive fares and ensuring perceived value for passengers. The application of predictive analytics and machine learning algorithms enabled airlines to anticipate fluctuations in demand and adjust prices dynamically. Rooted in both traditional statistical modeling and modern data science methodologies, this empowered airlines to optimize revenue by tailoring fares to diverse passengers. This dynamic and adaptable approach resonates with travelers’ preferences and is empowered by insights from a new level of comprehensive data analysis.

“Visualizing” the Potentials of Data Science

Contemplating this analogy, I am drawn to the family of automobile manufacturing. If we compare Benz and Ford Motor companies to mathematics and statistics, then Benz, in the early 19th century, is renowned for building the world’s first practical automobile, whereas Ford, just a few years later, introduced assembly-line production, making cars more affordable and accessible to the general public. As time progressed, numerous automakers such as General Motors, Chrysler, Toyota, Honda, Braun Industries, Rosenbauer, etc., joined the family, innovating a wide variety of automobiles to meet society’s ever-expanding needs: vans, pick-up trucks, SUVs, 18-wheeler trucks, ambulances, fire engines and more.

Fast-forward to today, and we encounter a rising star in the automaker family called Tesla, analogous to data science in the analytics family, capturing the imagination and expectations of the world. According to a Yahoo Finance report, as of February 16, 2023, Tesla produced an annualized 1.3 million vehicles, whereas Ford produced 3.9 million vehicles. This implies that Tesla’s production by vehicle count was about 30% of Ford’s. However, when it comes to financial results, Tesla’s market value was $623 billion, whereas Ford’s market value was $50.7 billion. This makes Tesla’s market worth more than 12 times that of Ford’s.

Among the many reasons for Tesla to attract such exuberance from investors and enthusiasts are indications of data science’s allure, as outlined in parentheses in the following:

Although Tesla is undoubtedly a member of the automaker family, it is also seen as a key figure in the tech family, driving global innovation.
Tesla not only transformed automobiles, shifting away from designs that use a little electricity (sample data) to start a vehicle and then rely on fossil energy (inference) to propel the vehicle, but also utilizes vast amounts of electric power (big data) to drive nearly every component of the vehicle, from its brakes to its engine.
Tesla patented the automobile’s power transmission and management system (data analytics principles) for electrified vehicles (big data-driven).
Tesla revolutionized vehicle production (analytics framework) by applying Giga Press technology (machine learning) to build car bodies in large pieces (modulating analytics).
Tesla innovates hardware (data storage), software (Python, R, TensorFlow, etc.) and framework (analytics platform) to create an industry-leading Full Self-Driving system (AI).

The question arises: Should we count Tesla as a sibling or a trailblazer of the automaker family? Regardless of the answer, one thing is clear: Tesla represents a new breed of automakers that is propelling the industry into a future aligned with innovations across many industries. Similarly, data science’s integration of mathematics, statistics, computer science and big data allows it to embrace its ancestry while charting a new course that stretches the boundaries of possibilities in the world of analytics and beyond.

The Role of Data Science in Analytics

In its adolescence, data science bears the responsibility of discovering and enabling big data- and AI-driven automation and creation, combining academic integrity with technological sophistication. Currently, many businesses are still in the early stages of incorporating data into decision-making processes, either by establishing their data collection systems or by relying heavily on human intelligence and sample data, as observed in new market exploration or clinical trials. In such contexts, the urgency and relevance of expanding data science beyond conventional statistics are less pronounced. Consequently, terms like data science, statistics and business analytics may be somewhat interchangeable for some companies or operations, and this phenomenon is likely to persist for a considerable time.

However, similar to Tesla, data science must prioritize innovation within its own unique specialty and showcase it to both present and future users. This involves deriving insights and automation by leveraging big data, modern computing technology and AI. Through this fusion, data science acts as a bridge between the traditional foundations of data analytics and its promising future driven by innovation.

This perspective holds true from an academic standpoint as well. To establish itself as a distinct scientific field within analytics and the broader scientific families, data science needs to root itself in the foundations of mathematics and statistics while expanding its application scope, without seeking to replace or overshadow them. It carries the responsibility of harnessing the value of big data and modern technology. Mathematics and statistics have historically driven various academic and business fields, albeit constrained by data limitations. By focusing on its unique strengths, data science can navigate its course as a transformative force, illuminating the path toward a new era of big data analytics and discovery.

Zhiwei Zhu

Zhiwei Zhu is a clinical assistant professor and academic director of the undergraduate Business Analytics and Information Management (BAIM) program at the Mitchell E. Daniels, Jr. School of Business at Purdue University. In 2023, he was recognized as one of the best undergraduate business school professors by Poets&Quants. With more than a decade of industry experience, he played pivotal roles in establishing and managing business analytics operations in global companies.

Keywords: