September 8, 2023 in Analytics Skill Sets

Business Analytics and Data Science Processes, Personas and Learning Paths

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/orms.2023.03.11

Business analytics and data science are gaining popularity due to several key factors. First, there is a growing need to make quick and informed decisions in a highly competitive global market. Second, the availability and affordability of data have increased with advancements in technology, such as big data and related tools for data capture, storage and retrieval as well as hardware platforms and software/algorithms. These advancements enable organizations to leverage large and complex data sets for analysis. Third, the adoption of standardized methodologies, known as standard processes, has helped transform analytics and data science from subjective art forms into structured scientific practices. This has allowed for the replication of best practices and increased effectiveness as well as efficiency. Finally, there has been a cultural shift in organizations moving away from traditional experience-based decision-making toward evidence- and data-driven decision-making [1].

As the demand for analytics and data science professionals has grown, the terminology, roles and job titles associated with these fields have also expanded. Initially, the term “data scientist” encompassed individuals with broad skills, capable of handling all aspects of analytics projects, from problem identification to model deployment. However, it became apparent that this definition was too broad, and very few individuals could fulfill all the requirements (akin to unicorns). Consequently, the concept of a data scientist evolved into a team-based approach, with individuals possessing complementary skills and experiences. This shift led to the emergence of various personas and job titles associated with specific tasks within analytics and data science processes. Although the following list is not exhaustive, it provides definitions for some prominent personas.

Business Analyst: These individuals possess extensive business/domain experience and excel at identifying problems, opportunities and continuous improvement initiatives. They primarily focus on management and may not possess strong technical/programming skills. They usually rely on others for analytics tasks. Intuitive, no-code/low-code analytics platforms can greatly help these individuals excel in the workplace as citizen data scientists.

Business Intelligence Analyst/Data Analyst: These professionals have been active for nearly three decades, originating from the business intelligence and data warehousing field. Their responsibilities typically involve utilizing structured query language (SQL) to extract data from transactional and data repositories/warehouses, generating standard or ad hoc reports and creating dashboard-like information portals for managers. They rely on tools such as Excel and other data visualization/reporting tools. They primarily operate within the descriptive analytics phase of the analytics process.

Data Wrangler/Data Engineer/Data Architect: Often referred to as the “people of data” or even “data whisperers,” these individuals specialize in acquiring and preparing data. Data wranglers locate relevant data sources, often from the internet, and retrieve, standardize and store it. Data engineers handle large volumes of diverse data repositories, employing engineering design and experimentation skills to combine and prepare the data for analytics and data science projects. Data architects design the infrastructure for acquiring, storing and retrieving various types of data. They contribute to both the descriptive analytics and predictive analytics phases as enablers/supporters rather than key knowledge creators.

Data Scientist/Machine Learning Engineers: Traditionally, these professionals were considered experts capable of handling all aspects of business analytics and data science. Their skill set encompassed the roles mentioned previously. However, now, data scientists are known for their advanced machine learning and knowledge discovery skills, making them well suited for the predictive analytics phase.

Operations Research (O.R.) Analyst: Although the popularity of this role is relatively new in the analytics continuum as an enabler of prescriptive analytics, the methods and techniques employed date back to the 1940s, when optimal use of constrained resources was crucial. O.R. analysts are proficient in utilizing information and alternative solutions produced by the descriptive and predictive analytics phases to derive optimal decisions for pressing problems. They employ optimization, simulation and multicriteria decision modeling techniques, fitting seamlessly into the prescriptive analytics phase.

Additional job titles/roles in analytics and data science include the following:

1. Functional areas or specializations within the organization

  • Operations analyst
  • Marketing analyst
  • Logistics analyst
  • Product/design analyst
  • Customer analyst
  • Supplier analyst
  • Decision analysts (for strategic decision support)

2. Managerial and administrative supporting roles

  • IT systems analyst/database administrator
  • Data analytics consultant/data analytics manager
  • Analytics project manager
  • MLOps Engineer

3. Other roles

  • Professors/educators
  • Students/programmers
  • Research scientists

The Standardized Process for Analytics

As has been the case in many other computational paradigms, extracting knowledge from large and varied data repositories began as trial-and-error experimental processes. Many practitioners have approached the problem from the perspective of characterizing what works and what doesn’t. For quite some time, data mining (or more recently, data analytics and data science) projects were carried out as artistic, ad hoc, experimental endeavors. However, to methodically conduct analytics projects, a standardized process needed to be developed. Based on best practices, in the early days, data mining researchers in academia and industry have proposed several processes as simple step-by-step methodologies to standardize and streamline data analytics projects. These efforts have led to several standardized processes, including (Knowledge Discovery in Databases), SEMMA (Sample, Explore, Modify, Model and Assess), DMAIC (Define, Measure, Analyze, Improve and Control) of Six Sigma and CRISP-DM (Cross-Industry Standard Process for Data Mining). Among these standard processes, the one that has been dominating the practice is CRISP-DM [2].

CRISP-DM is a widely used and well-established methodology for conducting data analytics projects. It provides a structured approach using a six-step iterative process [3] (see Figure 1).

six-step CRISP-DM process
Figure 1. The Six-Step CRISP-DM Process for Business Analytics and Data Science

Connecting Analytics/Data Science Personas to CRISP-DM Process

As previously mentioned, business analytics and data science projects are complex undertakings that necessitate the use of established, standardized and methodologies. The steps and phases in these methodologies are executed by individuals with specific skills, known as analytics personas. In Figure 2, we attempt to illustrate the mapping of the personas in analytics and data science to the CRISP-DM process steps as follows:

  • Business Analysts: These personas play a crucial role in the “business understanding” phase of CRISP-DM. They are responsible for identifying business problems and opportunities, defining project goals and establishing success criteria. Their domain expertise and understanding of the business context guide the entire data mining process. They also play a supporting role in evaluation and deployment phases to ensure proper assessment and utilization of the outcomes.
  • Business Intelligence Analyst/Data Analysts: These personas are closely involved in the “business understanding” phases of CRISP-DM. Focusing on descriptive analytics in which the goal is to define what happened, they extract and analyze data descriptively from various sources, including transactional and data warehouse repositories, to gain insights and develop reports and dashboards based on key performance indicators (KPIs) and scorecards.
  • Data Wrangler/Data Engineer/Data Architects: These personas contribute greatly to the “data understanding” and “data preparation” phases of CRISP-DM. They play a critical role in acquiring, cleaning, integrating and structuring the data for analysis. Data engineers and data architects design the infrastructure and processes for data acquisition, storage and retrieval, ensuring that data is ready and accessible for modeling.
  • Data Scientist/Machine Learning Engineers: These personas are involved in multiple phases of the CRISP-DM process, and their main contribution is during the model building and evaluation phases. Data scientists and machine learning engineers apply advanced analytical techniques, develop models and assess their performance against defined objectives. O.R. analysts contribute to the evaluation and deployment phases, utilizing optimization, simulation and decision modeling techniques to support the decision-making process.
  • Other Roles: Various other personas, such as functional department analysts, IT systems analysts/professionals, database administrators and analytics project managers, can be connected to different phases of CRISP-DM depending on the analytics project’s specific requirements. They provide support and coordination throughout the data analytics project, ensuring successful execution of the methodology.

Overall, the personas in analytics and data science align with different stages of the CRISP-DM process, working collaboratively to identify and address business needs and objectives by understanding the data, preparing it for analysis, developing various models, evaluating their performance and deploying the results in a real-world context. It is evident that successful execution of analytics and data science projects relies heavily on the coordination and collaboration of multiple professionals encompassing various data science personas.

mapping analytics personas to CRISP-DM phases
Figure 2. Mapping of Analytics Personas to CRISP-DM Phases

How Do People Obtain These Skills? What Are the Learning Paths?

There are two main paths: programming languages (e.g., Python and R) and visual modeling/programming tools (e.g., IBM Modeler, SAS Enterprise Miner, KNIME, Weka, Orange, etc.). Learning paths are easier and shorter if they are free of syntactic details of coding and debugging, allowing the analyst to spend more time and focused effort on considering the foundational concepts and applying best practices. Visual modeling tools (e.g., KNIME Analytics Platform) allow this to happen [4].

Obtaining the skills necessary for the personas in analytics and data science typically involves a combination of formal education, practical experience and continuous learning. Here are some relatively common learning paths:

  • Academic Education: Many professionals acquire foundational knowledge through specialized certificate programs or, more substantially, from undergraduate or graduate (master’s or doctorate) degree programs in fields such as business analytics, data science, data analytics, applied artificial intelligence, computer science, industrial engineering, statistics, mathematics or other related disciplines. These programs provide a theoretical understanding of key concepts, algorithms and analytical techniques.
  • Online Courses and MOOCs: Numerous online platforms offer courses and massive open online courses (MOOCs) focused specifically on analytics and data science. These courses cover a wide range of topics and cater to different skill levels, allowing individuals to learn at their own pace.
  • Bootcamps and Training Programs: Bootcamps and intensive training programs provide immersive, hands-on learning experiences. They often focus on practical skills and real-world projects, allowing participants to develop expertise in a shorter time frame.
  • Self-Study and Practice: Many professionals acquire skills through self-study, utilizing online resources, books, tutorials and practice exercises. They explore programming languages, data manipulation tools, visual modeling platforms, statistical techniques and machine learning algorithms on their own. It is a great idea to join some of the data science competitions at online platforms such as Kaggle to gauge your skills against others in the field.
  • Industry Experience and On-the-Job Learning: Practical experience working on analytics and data science projects is invaluable for skill development. Individuals learn through real-world applications, collaborating with colleagues and tackling complex challenges encountered in their professional roles.

To expedite learning and skill building, visual modeling and programming can help. They provide a visual framework for understanding the flow of processes and relationships between different elements. Visual programming environments that employ drag-and-drop functionality with graphical user interfaces make it easier for individuals to build and manipulate models without needing to write extensive lines of code. Therefore, as opposed to spending ample time on figuring out the syntactic details of the code, one can spend that time understanding the foundational concepts and best practices of analytics and data science modeling. This simplification can greatly accelerate the learning process and allow analysts to quickly prototype and experiment.

Visual modeling tools can also facilitate easy collaboration among team members by providing a visual representation of ideas, data flows and algorithms. This improves communication, enabling team members to understand and contribute to the project more effectively. These tools often provide capabilities for rapid prototyping and iteration, allowing individuals and teams to visualize and quickly refine and optimize their models. This iterative process accelerates team-based learning, experimentation and optimization of models to meet and exceed the expectation of the analytics projects.

In summary, learning paths for analytics and data science skills involve a combination of academic education, online courses, practical experience, self-study and on-the-job learning. Visual modeling and programming tools such as KNIME Analytics Platform can enhance learning (of the fundamental concepts and best practices) and skill development by providing intuitive and collaborative environments for better internalization, experimentation, prototyping and communication of complex concepts, processes and end results.

References

  1. Delen, D., 2020, “Predictive Analytics: Data Mining, Machine Learning and Data Science for Practitioners,” Upper Saddle River, NJ: FT Press.
  2. Delen, D., 2018, “Data analytics process: An application case on predicting student attrition,” Analytics and Knowledge Management, Boca Raton, FL: Auerbach Publications, pp. 31-65.
  3. Sharda, R., Delen, D. & Turban, E., 2023, “Business Intelligence, Analytics, Data Science, and AI,” 5th edition, London: Pearson.
  4. Delen, D., 2021, “KNIME ANALYTICS PLATFORM - Open-source business analytics and data science tool provides comprehensive capabilities in the classroom,” OR/MS Today, Vol. 48, No. 5, pp. 22-23.

Dursun Delen

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.