June 17, 2019 in Data Science
Searching for the Perfect Unicorn
Data scientists come in many flavors and skill sets. How to find the one that meets your needs.
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2019.04.02
Data scientists, sometimes called “unicorns” because of their unique blend of skills from multiple fields combined with fact that they are hard to find (at least ones who perfectly match a particular need), are increasingly in high demand throughout industry [1, 2]. A quick Google Trends search for “data scientist” shows a dramatic growth in web searches since 2004. In terms of web-search popularity, the term “data scientist” far surpasses traditional terms such as “statistician” and “operations researcher,” two careers that have been around much longer [3] (see Figure 1).
Do data scientists really need to have multiple skills? Many companies expect they come with: 1) advanced computer programming for gathering and processing data (structured or unstructured) from multiple data sources, for model deployment possibly working with data or software engineers; 2) advanced analytics (statistics and applied mathematics) that are required to develop models or algorithms; and 3) some subject matter knowledge on the areas of application (e.g., marketing knowledge is required to handle marketing problems, biomedical knowledge is needed to deal with biomedical data, risk management is required to address risk analytics).
On top of all these, corporations require some degree of soft skills, including strong communication skills (oral and written presentations), coupled with data visualization because they often work with business executives, managers and analysts who speak in business language. See Figure 2 for the Venn diagram of data science skill sets.
Figure 1: Google Trends of search words for data scientist, operations researcher and statistician.

Figure 2: Venn diagram of data scientist skills.
Three Types of Data Scientists
The mainstream media tends to report news of data scientists who created innovative algorithms, for example see Quach [4] and Kahn [5]. Many of them are university researchers, professors or doctoral students engaged in publishable research projects. Some are in technology firms, innovating algorithms for new products or taking current products to the next level. These researchers who have had years of advanced programming and analytics training typically use open source tools and develop their new functions or packages on top of the existing ones.
On the other hand, there are a number of relatively user-friendly commercial software products on the market that do not require programming or highly sophisticated analytics skills to operate, for example DataRobot, RapidMiner, KNIME, Alteryx and BigML [6, 7]. Analysts who use these types of commercial software may come from strong business backgrounds with analytics skills and are sometimes known as “citizen data scientists” [8, 9]. With that as a backdrop, I’m often asked at industry conferences the following questions:
- Now that we have these easy-to-use tools, do we still need to hire data scientists?
- Why do we (or anyone) need to learn machine learning and deep learning when we can just push a button to create a model?
- Do we really need Ph.D.s or could MBAs with analytics skills be sufficient or even better?
To answer these questions, let’s examine the pool of data scientists. First of all, they are not a homogeneous group of professionals with identical skills and qualifications. Indeed, they range from quantitative BAs or MBAs to MS graduates to Ph.D.s, not to mention various disciplines within each type of degree (e.g., analytics/data science, statistics, operations research, computer science, mathematics, economics, engineering, physics, operations management, finance, psychology and epidemiology). Following is a categorization scheme for three types of data scientists:
Type 1: Citizen Data Scientist. These are analysts who can take on advanced analytics. They generally have a business background with a BS in finance or economics, or an MBA with a concentration in analytics areas such as business analytics, finance, information systems or operations management. Many also have had prior experience such as being a business analyst or data analyst, or on the business side running a product before joining data science. Given that they tend to have more business experience, some employers believe this is a group that can help them scale up the resources to meet the growing demand for artificial intelligence (AI).
Type 2: Business Data Scientist. Since the 1990s, this group has been the mainstream professionals responsible for advanced analytics and developing predictive models in many organizations. They tend to be the go-to experts when business managers have an analytics need. These people used to have (and some may still have) job titles such as statistical modeler, management scientist, data miner, decision scientist, econometrician or simply statistician. They tend to have a master’s degree or higher in a quantitative field such as economics, operations research, statistics, computer science or applied mathematics. While many have been in this career for some time, others may have transitioned from other quantitative, IT or STEM fields (e.g., risk management, actuarial science, data engineering, data architecture, physics and meteorology).
Type 3: Data Science Innovator. This group has perhaps gained the most attention in conferences and in the media as they are responsible for developing the latest and greatest algorithms. They often have a Ph.D. in computer science, machine learning or statistics, where publications are their top interest whether they work in academia or industry. In addition to deep technical knowledge, they often come with high creativity and innovative skills. Many tech companies cannot find enough of them and are pushed to pay high salaries in order to compete for such talent. Many of them are directly hired from universities or R&D departments in large corporations.
Notice that each of these three types can grow from a beginning level career to a senior level, ultimately managing other data scientists and/or leading projects. Additionally, data scientists can be specialized in one or multiple areas of applications, e.g., marketing/sales, genomics, biomedicine, risk management, insurance claims, transportation and logistics, investment finance and operations management.
Table 1 summarizes the major channels to identify data scientists. There is an increasing trend of quantitative professionals, equipped with new tools and learning, joining this relatively new branded field that integrates knowledge from various fields and utilizes the explosion of available data that is ready to be mined for business insights.
| Data Scientist Type | Skills/Responsibilities | Typical Qualification |
|---|---|---|
| 1) Citizen Data Scientist | Apply commerical software with solid interpretation; strong, domain knowledge & business skills | Quantitative MBA/Master/BA/BS/BEng; or grown from Business Analyst or Business Intelligence Analyst |
| 2) Business Data Scientist | Ability to write advanced codes, process complex data, acquire & apply algorithms, integrate algorithms, explain findings (correlation vs. causation) | MS Statistics, Computer Science, Economics, Management Science, Analytics, Data Science, PhD Economics, Marketing, Social Sciences |
| 3) Data Scientist Innovator | Developer of new algorithms, innovation & research mindset to create new solutions and identify new problems | PhD/ABD Computer Science, Statistics, App Math, Engineering, Econometrics, Operations Research, Analytics with research & publication records |
Table 1: Three types of data scientists.
Data Scientists for Your Business Needs
With the three types of data scientists described above, which type do you really need? That depends on what business problems you have.
Citizen data scientists can solve relatively standard or structured problems with commercial software. Since many commercial software have pretty sophisticated techniques such as gradient boosted tree, random forest or deep learning (with default input parameters), citizen data scientists can apply their strong business and data knowledge along with the powerful software. Citizen data scientists usually have strong communication skills along with good presentation, data visualization and descriptive analytics for interacting with business partners. If you have many standard and structured problems, this group may be the best fit given their scale and business acumen.
However, when faced with less structured problems or more complex problems that cannot be solved by off-the-shelf algorithms (e.g., segmentation with uncertain input variables, sample size requirement for predictive modeling, development of uplift models, uncovering causal relationships), that is where business data scientists come in.
Business data scientists have typically been specialists for a long time, and they tend to have experience with deep and broad arrays of analytics techniques for standard or nonstandard problems. They also have the ability to integrate various algorithms to solve difficult problems, a skill that requires solid understanding of the methodologies behind these algorithms. Many have a strong ability to explain or interpret model results, and some would be capable of answering causality questions using scientific methods such as randomized experiments and causal inference to uncover insights and improve the ability to optimize the current process.
For those who also possess strong communication and presentation skills, they can serve as key analytics consultants interacting with business clients, data and technology suppliers. Senior members of this type can lead or guide other business data scientists and citizen data scientists on projects. Some may also lead data science innovators given their longer business experience.
If your mission is to create an AI-driven product that is better than competitive products from tech firms, research institutes or universities, you will need this group of experts. For example, if you want to develop the latest speech recognition system, the best voice- or text-based chatbot, a sophisticated digital health prescriptive application, the latest home robots or the most accurate medical imaging system, hire a data science innovator. If you need to outcompete other tech firms, you definitely would need these specialized experts.
Data science innovators would also likely need to publish articles (quite often that is also their interest), and sometimes file for patents to help establish or improve the reputation of your firm in the industry. Since many of these talented people come from academia and may not have been in the corporate environment very long, they may need some time to learn about the corporate culture and business acumen, but this is not necessarily a generalized statement for everyone. Collaboration between business organizations and academia to solve various real-life problems may offer productive solutions. See Figure 3 for their business need mapping.
Your AI or data science department may support multiple business areas such as marketing, sales, product management, risk management, operations management, quality management, finance, image recognition, text analytics and so on. If so, you will need to align the appropriate data scientists from Type 1, 2 and 3 with these business lines along with data engineers [10] or business analysts from different departments; Figure 4 serves as a hypothetical example. Support to each business line will ideally be managed by a team leader who serves as a relationship manager working closely with business partners and a product owner or project manager overseeing a portfolio of projects.
Such a leader can be a business professional who has interest and experience in managing both analytics projects and business relationships, or a data science leader who has strong business acumen and knowledge about the business line. For instance, a citizen data scientist with many years of business and analytical experience could be a good fit. It could also be an opportunity for Type 2 and 3 data scientists who have extensive business exposure and interest in business relationship building. Exactly which type of data scientists should be aligned with which business line depends on your business need for each business area.
Type 1: Citizen Data Scientist
- Balance of Descriptive and Predictive Analytics
- Strong Domain Knowledge and other business skills
Type 2: Business Data Scientist
- Manage clients and suppliers
- Technical skills + Domain Knowledge + Strong Communication
- Integration and Creativity
Type 3: Data Scientist Innovator
- Research oriented, publications, patents
- Compete with tech firms and universities
- Develop specific powerful and world-class algorithms
Figure 3: What do these talents do for your business?
With this framework, let us revisit the three questions previously asked:
- Now that we have these easy tools, do we still need to hire data scientists?
Even with user-friendly tools, it would be less risky and more effective to have data scientists around. While these tools are more popular for Type 1 data scientists, they could be used by data scientists of all types. Depending on your business objectives, mastery of these tools at various depths of the methodologies behind them would be essential to ensure that the right tools are utilized to solve the right problems.
- Why do we (or anyone) need to learn machine learning and deep learning as we can just push the button to create a model?
As George Box said, “All models are wrong, but some are useful” [11, 12]. Having these powerful and user-friendly tools can be both convenient and risky. Depending on the complexity of the problems at hand, development of the right solutions requires different depths of understanding of the mechanics behind these sets of “buttons,” as for choosing the appropriate algorithms and input parameters for each algorithm [13]. For more complex and unstructured business problems, Type 2 and 3 data scientists would be needed to integrate existing algorithms or develop new tools to meet the business objectives.
- Do we really need the Ph.D.s or could MBAs with analytics skills be sufficient or even better?
This largely depends on your business needs and the individual skill sets. While there are exceptional talents who are savvy in solving various problems without a formal degree, demonstration of advanced training in analytics and R&D experience is typically required if you are in the business of algorithm competition or innovation for AI-based products.

Figure 4: Illustrative model of business line alignment.
Conclusion
Given the increased accessibility of commercial software for data science and AI, many general business analysts are able to perform advanced analytics without having to write codes and learn the complex details of the background methodologies. These Type 1 data scientists are able to integrate their business experience with analytics tools to solve many problems. Type 2 and 3 data scientists, on the other hand, bring in deeper and broader knowledge of analytics and algorithmic skills to handle more complex situations. Type 3 data scientists, more R&D focused, would be needed to create AI-based products to compete with others on algorithms or innovate new solutions.
Not all companies need all three types of talents. The data scientists of various types can complement each other based on business objectives, communication and technical skills, as well as creativity. This article provides a framework on the categorization of data scientists and their impact. Ultimately, what type of talents you need depends on your business objectives.
Note. The views expressed are as of the date indicated, and unless otherwise noted, the opinions provided are those of the author and not necessarily those of Fidelity Investments.
References & Notes
- Marr, Bernard, 2017, “6 Business Concepts You Need to Become a Data Science Unicorn,” KDnuggets, https://www.kdnuggets.com/2017/03/6-business-concepts-data-science-unicorn.html.
- Quesada, Joel, 2019, “The Data Scientist Unicorn,” Towards Data Science, https://towardsdatascience.com/the-data-scientist-unicorn-8c86cb712dde.
- Google Trends indexes data to 100 (where 100 is the maximum search interest), so data scientist reached the maximum search volume most recently in this chart compared to the past data and the other two job titles. Data engineers generally work with data scientists on tasks such as data wrangling, programming, and model deployment [14, 15].
- Quach, Katyanna, 2017, “How DeepMind’s AlphaGo Zero Learned All by Itself to Trash World Champ AI AlphaGo,” The Register, https://www.theregister.co.uk/2017/10/18/deepminds_latest_alphago_software_doesnt_need_human_data_to_win/.
- Kahn, Jeremy, 2019, “Three ‘Godfathers of Deep Learning’ Selected for Turing Award,” Bloomberg, https://www.bloomberg.com/news/articles/2019-03-27/three-godfathers-of-deep-learning-selected-for-turing-award.
- AhmedKhan, Irfan, 2018, “7 Data Science & Machine Learning Tools for People Who Don’t Know Programming,” Big Data Made Simple, https://bigdata-madesimple.com/top-data-science-and-machine-learning-tools-for-people-who-dont-know-programming/.
- Jain, Aarshay, 2018, “19 Data Science and Machine Learning Tools for People Who Don’t Know Programming,” Analytics Vidhya, https://www.analyticsvidhya.com/blog/2018/05/19-data-science-tools-for-people-dont-understand-coding/.
- Banker, Steve, 2018, “The Citizen Data Scientist,” Forbes, https://www.forbes.com/sites/stevebanker/2018/01/19/the-citizen-data-scientist/#6bd125d02702.
- Boulton, Clint, 2018, “The Age of the Citizen Data Scientist Has Arrived,” CIO, https://www.cio.com/article/3322940/the-age-of-the-citizen-data-scientist-has-arrived.html.
- Data engineers generally work with data scientists on tasks such as data wrangling, programming and model deployment [14, 15].
- Box, G.E.P., 1976, “Science and Statistics,” Journal of the American Statistical Association, Vol. 71, pp. 791-799.
- Box, G.E.P., 1979, “Robustness in the strategy of scientific model building,” in Launer, R. L.; Wilkinson, G.N. (eds.), “Robustness in Statistics,” Cambridge, Mass.: Academic Press, pp. 201-236.
- Algorithms such as deep learning neural network have many input parameters to choose, such as the number of layers, number of neurons in each layer, learning rate and momentum rate. While the default options of user-friendly software can be a good starting point, experienced data scientists can fine-tune these parameters to achieve a better result [16].
- Green-Lerman, Hillary, 2018, “What is Data Engineering?” DataCamp, https://www.datacamp.com/community/blog/data-engineering.
- White, Sarah K., 2018, “What is a Data Engineer? An Analytics Role in High Demand,” CIO, https://www.cio.com/article/3292983/what-is-a-data-engineer.html.
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville, 2016, “Deep Learning,” MIT Press.
Victor S.Y. Lo, Ph.D., is the Center of Excellence leader, Data Science and Artificial Intelligence, Workplace Investing, Fidelity Investments. He has 25 years of consulting and corporate experience employing data-driven solutions in a wide variety of business areas. An active INFORMS member, he has served on the steering committee of the Boston Chapter of INFORMS and is an elected board member for the National Institute of Statistical Sciences (NISS). In addition, he has co-authored a graduate level econometrics book, published numerous articles in data mining, marketing, statistics and management science literature, and is finishing a graduate-level textbook on causal inference in business.