April 13, 2021 in Analytics Conference

Data Science at The New York Times

Plenary session at the Virtual 2021 INFORMS Business Analytics Conference

SHARE: PRINT ARTICLE:print this page https://doi.org/10.1287/LYTX.2021.02.26n

In 2001, William Cleveland, then statistician at Bell Labs, published an article detailing an action plan for expanding the technical areas of statistics into the emerging “new field of data science.” Fast forward 20 years and “data scientist” is now at the top of several “best jobs” lists.

Let’s focus on the year 2013 for now – the year that Chris Wiggins took a sabbatical from Columbia University to work at The New York Times, and the year he first introduced data science to the well-known publication, ultimately creating the role of data scientist and an entire data science team at the Times. With a Ph.D. in theoretical physics, it might seem like a leap to become chief data scientist at a newspaper. What’s the connection? How does a newspaper use data science?

Wiggins connected all the dots and more in his plenary talk during the Virtual 2021 INFORMS Business Analytics Conference. According to Wiggins, physics and biology both have a mindset and a toolset, that when applied to data, becomes data science, or more specifically, machine learning. For example, virology and genetics use data science models, such as in 2012 when a flu strain emerged and scientists needed to find out whether it was swine flu or bird flu – they had to use an abundant dataset to reframe the problem model, which then became applied analytics. “We need to know,” Wiggins said, “do we kill all the birds or all the pigs?”

Data science has its own mindset and toolset; it is a craft to reframe a problem as a machine learning (ML) task and reframe it again for real-world understanding. How is data science used at the Times? Wiggins and his team develop and deploy ML solutions to the newsroom and business problems to help editors do their jobs better, increase revenue and help readers engage – applied tasks. Machine learning can be used to create new products and processes, enhance news judgment and find better ways to let readers speak.

newsvendor problem

There are three types of analytics problems – descriptive, predictive and prescriptive – and Wiggins provided an example of each and how it is tackled at The New York Times.

Descriptive: How to compress large datasets into something digestible. At The New York Times, the newspaper wanted to know who is reading what and where? The answer to this descriptive problem is Readerscope, an audience insights engine/tool in a graphical user interface. Basically, a summary of a large abundance of data.

Predictive: This type of problem is one that occurs most often at the Times. Wiggins and his team have created several predictive models to help with things such as reader retention (observational data and trends that are interpretable). The well-known newsvendor problem has also been solved using machine learning models (how many Times copies should be sent to individual stores selling single copies to optimize profit?), which in turn helps ad sales. The Times has also used data collection to survey readers about how they feel when reading certain articles, which advertisers then use to decide placement and target a feeling rather than choosing a topic or section of the newspaper.

Prescriptive: helps with decision support. Or, from a 2007 Innovation Report, “the art and science of getting journalism to readers.” The data science team at the Times built a product in Slack called Blossom that uses machine learning to predict when to promote certain articles for social media engagement. The Blossom bot advises editors when and where (which social media platform) to promote stories. These decisions then become data-informed, not data-driven, which makes this a prescriptive solution. Another example of prescriptive analytics is the recommendation engine at the Times, which uses algorithms for “highly editorially curated” content in very controlled ways.

Wiggins concluded his talk with some advice. To put data science to work in real-world, industry problems, you need the right support form the right people, including the higher-ups, and hiring people to do the “people stuff” (or soft skills) because a main component of data science is still communication. Wiggins notes that not all problems are machine learning problems and not all answers come from data, so be willing to reframe the problem or your solution. Follow the steps: explore → learn → test → optimize → report. Finally, an organization, like The New York Times has, needs good policies for data privacy and sharing. “Data is the new oil and data privacy is the new plutonium,” Wiggins said.

SHARE:

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.