January 7, 2019 in Forum

Societal trends create new challenges for data scientists

Christopher Berry

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2019.01.07

While you may not realize it, chances are you’ve used a recommendation engine. A recommendation engine is vital in protecting your attention from that serial oversharer on social media, in suggesting a course of action for your instant messaging responses or improving the chances that you’re going to see an article that’s likely to be relevant for you. Recommendation engines are central to many human-computer interactions because they’re effective.

This technology has had expected and unexpected impacts on people, their communities and societies. What does this mean for people, researchers, companies, organizations, governments and the data scientist? Three major societal trends offer clues.

Three Societal Trends

The value of personal data is increasing.

People have a right from forced disclosure. They have a right to privacy. Like many rights, we make choices, either collectively or individually, to relax that right. We choose to disclose information about ourselves to each other, the state or the firm, in exchange for value, such as the ability to be contacted via text message by a friend, the ability to drive a motor vehicle or the ability to have monetarily free email.

More people value their right from forced disclosure and adopt technologies that protect their personal data. The trend toward encryption and ad blocking are in part related to the reassessment of the value of their own data.

More firms are able to augment the value of personal data they get by mixing it with additional data sources. For instance, your resume or CV, by itself, has value. Businesses were built around exchanging the opportunity to be seen and hired with the opportunity to have a job position be seen. New value was created when the resume was mixed with the Rolodex. Suddenly, everybody on the network could be recruited at any time, so there is more value both for recruiters and prospective candidates.

Finally, information technology (IT) has become much better at generating new data, in particular from sensors embedded in wearable and connected devices. IT has become much better at generating new value from data, like predicting heart problems sooner or making roads safer.

The value of personal data is increasing because it is more valuable to the people who own it, and to the organizations that lease the privilege to use it.

The demand for equity is increasing.

In many societies, there may be a general sense that our political economy is not equitable. Equity is a strategy of enabling people to have an equal opportunity to succeed. It should not be confused with equality, which is a strategy for causing equal outcomes. This shows up in statements about rigged economies, unequal access to capital and grotesque legal system outcomes.

This trend also applies to recommendation engines. Sometimes the biases that are inherent in society appear in the data that society produces. These biases can be picked up and amplified by recommendation engines. This may cause harm to people by denying them opportunity, reinforcing negative portrayals of groups of people or creating other unfair outcomes. People may be denied access to credit and capital, the opportunity to meet a really wonderful person for a date or the chance to discover something inspiring.

This has led to increased interest in fairness, accountability and transparency in machine learning (FATML) and greater regulatory demand for the explainability of the decisions produced by a recommendation engine.

The sense of polarization is increasing.

For many reasons, people are more polarized. To some, the progress toward equity may seem insufficient. To others, equity may feel like an increased risk and may contribute to insecurity. Our economies are more intertwined than ever. There is more wealth generated than ever. And in spite of an increase in connectivity and wealth, our political and economic systems appear to be increasingly polarized.

Recommendation engines that use data about social ties may amplify polarization. Sometimes, recommendation engines may be manipulated, adversarially, in order to amplify polarization. Automated bots can generate thousands of interactions with a recommendation engine in order to trick it. This is an effort to push algorithms to recommend and amplify extreme opinions, with the purpose of increasing alienation, isolation or radicalization.

These three societal trends create new challenges for the data scientist.

New Challenges for the Data Scientist

At the center of this is the data scientist. A data scientist turns data into product. While they may specialize in a domain, they have to know how to write code, know statistics, algorithms, and business. The data scientist now has to work more broadly to define how the products they create interact with the state, firm and their intended users—and in turn, how those interactions impact the performance of the recommendation engine itself. These challenges come from three sources.

Challenges from the state.

The three societal trends are fueling new regulation. State regulation on data collection, processing and use is as heterogeneous as states themselves. Some states have been zealous about what can and can’t be collected, and for which purposes. For instance, under GDPR, data about ethnic origin, health, sexual orientation and political, philosophical and religious beliefs is prohibited (with very narrow exceptions enumerated in Article 9). The motivation for denying collection of such labeled data is prompted by the fear that an algorithm will become harmful because of the label. However, to understand whether or not a recommendation engine systematically discriminates against a historically disadvantaged group, a label about that group has to be collected. Without access to the label, for the purposes of model evaluation, the data scientist cannot fully understand if the algorithm is being fair or do anything about it if it isn’t fair.

The right to be forgotten has an impact on recommendation engines. Removing information from the data set used to train a recommendation engine can affect the user experience. Even if somebody is forgotten by the firm, their behavior may remain crystalized in the parameters of an algorithm long after their personal identity has been forgotten. This could be especially problematic as a major research trend is the idea that intelligence is grown over the lifetime of an engine.

Challenges from users.

The users of recommendation engines are increasingly conscious of the impact they have on their lives and the reach of their personal brand. For instance, some people who create and share large amounts of content say that they just want all of their friends to see all the content that they create. They expect the recommendation engine to agree with their own personal judgment that everything that they share is worthy of a total audience, constantly. The recommendation engine often disagrees with that assessment.

In another instance, some users of a service and a recommendation engine may be engaged in an adversarial attack on the engine. These users are training automated agents in an effort to train a recommendation engine to be something other than what the organic user base wants it to be.

Some users guard their own profiles on a shared account because they like the way they have trained a recommendation engine to know them. For example, when a guest uses your account, you often feel it on your next login. Users expect to be known personally and expect the engines to anticipate their wants.

Because users are more conscious of the value of their data, they demand more value from the firms they are giving it to – manifesting itself in demands from the recommendation engine for greater explainability and utility. Additionally, because many users are more conscious of the impact society has on their choices, some demand recommendation engines that are fair, accountable, transparent and diverse.

Challenges from the firm.

Recommendation engines work to optimize an objective that is often aligned with the interests of the firm. That means if the optimization objective of the firm is to increase engagement under no constraints, then that is what the engine is going to learn to do.

The emergence of constraints on the optimization objective, demanded either by the state or users, may blunt the effectiveness of the recommendation engine. The data scientist often curses the problem of dimensionality under their breath while demanding more dimensions on which to train the engine. If these dimensions are withheld, either by demands from the state or by increased expectations from the user, recommendation engines may degrade in effectiveness. They’ll still work, they just might not be as effective.

Added requirements for the traceability of personal data through the firm may be a significant source of hidden technical debt and contribute to the reinforcement of data silos. Various organs within the firm may use state and user demands as a consideration when withholding data from the data scientist. The demand for traceability may reduce the speed at which data engineers can pipe data into recommendation engines.

These challenges add up into a big impact.

Impact

The data scientist already had a lot on his or her plate before these trends accelerated and manifested into new challenges. These are exciting opportunities. There is now greater opportunity to compete and differentiate user experiences powered by recommendation engines and privacy. Those who effectively design user experiences, balancing the wants of the user and the optimization objective of the firm, should enjoy success.

There is the opportunity to build recommendation engines that are better because they’re better for society, users and the firm. And there is tremendous opportunity in probing recommendation engines for their secrets and sharing those with users. Those that rise to these challenges will be rewarded with a stronger brand, greater access to valuable information over the long run, and ultimately be able to compete far better. A worthy challenge?

Christopher Berry

Christopher Berry is a data scientist at the Canadian Broadcasting Corporation (CBC) where his teams turn data into product. You can see some of their work at cbc.ca. Previously, he co-founded Authintic, a social authentication analytics company, developed product at Syncapse and co-founded the marketing science department for the digital agency Critical Mass.

Keywords: