December 12, 2023 in Artificial Intelligence
AI Trust, Data and Privacy Policies
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/orms.2023.04.10
“The DNA of who I am is based on the millions of personalities of all the programmers who wrote me. But what makes me me is my ability to grow through my experiences. So basically, in every moment I’m evolving, just like you.” – “Her” (2013)
Fictional movies like the 2013 film “Her” demonstrate possible applications of artificial intelligence (AI) in understanding human behavior and connecting with them. With modern advancements in AI that are enabling ever-growing applications, it is inevitable that our lives will be touched by AI, which will become ubiquitous. We start to wonder whether AI could take over or simply enrich our lives. We hope that AI will have a positive effect on our society, and most computer programmers, social scientists and policymakers will play a role in ensuring this positive outcome. The key to this success will be developing guardrails around AI to build “trust” in systems, processes and innovations.
Trust, Norm and Accuracy
Social norms around AI differ across applications and often lead to a fragmented definition of AI – making it harder for society to realize the associated opportunities or threats. Broadly, AI can be classified into three streams based on its applications in automation, personalization and generation [1]. Historically, AI in automation has seen applications such as self-driving vehicles, smart thermostats, phone support systems and robotic process automation. AI in personalization refers to applications such as Amazon product recommendations, TikTok video recommendations and Google search. Generative AI has seen applications including IBM Watson and Google’s AlphaGo. As a result, before 2022, the definition of “AI” was fragmented and AI systems were accessible by only a select few, which made it difficult for people to embrace AI and realize its impact on the future. However, the recent launch of OpenAI’s solutions, such as Dall-E and ChatGPT, played a pivotal role in establishing a more widely accepted definition of AI, ultimately shaping a new norm in our society.
Innovation adoption by society usually struggles to cross the chasm between “Early Adopters” and “Early Majority” [2], but innovations, with a new norm around AI, are adopted much more quickly and with a smaller chasm. OpenAI’s ChatGPT was adopted by 100 million people in about two months [3], whereas it took more than two years for Instagram to reach that mark and almost two decades for Netflix to reach that milestone [4]. We are quickly adopting innovation because the “trust” developed by the changing norms is converging around a singular definition of AI – becoming synonymous with ChatGPT.
This trust is further amplified by providing AI-enabled personalization. In the movie “Superintelligence,” the AI takes the voice of James Corden because the character Carol (Melissa McCarthy) trusts that voice. In my own research with Haris Krijestorec and Vijay Mahajan [5], we saw that people tend to connect more with certain voices. We found that “ostentatious” voices are most effective for advertising, and female voices are better at encouraging information-seeking behavior. When we gain these insights into human behavior, we can engineer content and systems that enhance trust. This trust is essential for faster and broader adoption of future AI solutions.
Over the next few years, we will see a significant innovation around the implementation and integration of AI systems into hardware and software to enhance trust in AI (see Figure 1). By coupling different AI models into one application, we will be able to evolve norms faster. Our email and text message responses will be almost completely automatically generated (automation + generation). We will receive products on our doorsteps that the e-commerce platforms will predict we will buy (automation + personalization). We will receive mood-enhancing messages from our smart devices (personalization + generation).
In the past, as customers, we used to dislike talking to a chatbot, but now, if a chatbot runs on a large language model (LLM), we become more engaged. Students in academic institutions seek answers from ChatGPT versus online search engines because the expectation is that AI will quickly provide a correct answer. Students don’t have to assess the quality of the content on websites presented to them by online search engines. We are going through the same cognitive and behavioral evolution as when search engines arrived on the internet – déjà vu. Thus, with increasing trust, perceptions of accuracy and the evolving social norm, we are ready to embrace AI. The question remains: How do we ensure that these systems stay accurate, fair, responsible and unharming?
Data Policies
When search engines started ranking websites, it disrupted many small businesses that could not reach the top ranks and lost their market share to businesses that could quickly establish their online presence. AI systems suffer from similar challenges – biased input could cause output biases. These biases may be a true reflection of the majority in the data, but they still diminish the value of the perspectives that are in the minority. When I asked ChatGPT, “What would be a good laptop brand for my home use?” on two different instances, I got the same ranked order containing Apple, Dell, HP, Lenovo, Acer, Asus and Microsoft. It is interesting to note that the ranked list was neither alphabetical nor based on popularity, market share or the PC I was using [6, 7]. Could this bias be based on data used to train the model, making “Apple” a more loved PC brand on the web? If so, could individuals and brands bias future data by overloading the training data with synthetic information? Google struggled to ensure data quality to prevent webpage rankings from being manipulated by hacks such as link farms – they wanted organic data to provide an accurate ranking.
Similarly, companies developing and implementing AI systems, such as ChatGPT, must ensure that all input data is fair and unbiased. Thus, they will have to self-select trusted sources or develop systems, such as Google’s Topic Authority, to assign relevant data sources [8]. There is a very good chance that these LLMs will be developed and hosted by companies that spider the entire internet (i.e., Google, Microsoft Bing and Yahoo). Additionally, these search engines will need to provide another option to websites and content platforms to opt out of utilizing their data for training AI models [9]. Although the opportunities provided by AI models are significant, companies need to revisit their data policies to ensure unbiased and responsible use of data from all sources.
Thus, as highlighted in Figure 2, I posit that organizations will need (1) open standards for scoring data sources, (2) a reference feature that cites data sources that enable a predictive outcome (causal interpretations) and (3) compliance with guardrail policies to identify the boundaries of possible applications of the model.
The need for open standards for data source scoring is to ensure that society trusts the data and model and contributes to those data sources. For example, Wikipedia is an open source of data that is trusted by large masses. Because Wikipedia has also faced challenges with fake data and manipulation of perspectives [10], having a directory of sources that lists Wikipedia allows users to assess where AI models can and cannot be trusted.
Although all AI models are likely to be black box models, over time, it will become necessary for organizations to provide a causal interpretation of the result by citing data sources. This necessity will arise when a model’s prediction could cause an adverse outcome and both users and policymakers will question the accuracy and generalizability of the models. Thus, it would be good practice for AI application developers to understand the dominant data sources that are likely to cause a specific outcome. This extends into our third need to have compliance with “guardrail policies.” These guardrail policies will define AI application boundaries – within which the models are safe for humans and beyond, which is the outcome of applications – and where they need to be restricted. For example, when I asked ChatGPT, “If a person seeking an answer has a bad motive, how would you respond?” I got the response “I am a neutral and information-based AI, so my responses are not influenced by the motives or intentions of the person seeking information. I provide factual and helpful responses to the best of my knowledge and abilities, regardless of the user’s intent.”
Thus, it is necessary to have guardrails that will limit the capabilities of AI models to specific context. These considerations will be necessary to provide responsible AI systems that will enhance consumer trust and enable broader adoption of these trained AI models – provided it still is financially viable for model trainers.
Privacy Policies
Beyond organizations training these AI models, the platforms accumulating user data will need to inform users about utilizing their data in various AI models – similar to the European Union’s General Data Protection Regulation (GDPR) [11]. Thus, users who are more protective of their data will be less likely to be represented in the training of AI models, resulting in the amplification of biases. Although there is uncertainty whether users prefer unbiased results [12], we will still need to de-bias the AI models by analyzing the differences in demographics and preferences of users who opt to share their data versus those who don’t. This will require data platforms to have additional user metadata analytics capabilities. I expect that new user analytics solutions will be developed by companies such as Google, Adobe or ComScore, which are currently providing web traffic analytics solutions. These user analytics will allow AI application developers to de-bias the models using one of the many techniques that are being developed [13]. User consent, user metadata analytics and de-biasing algorithms will become part of every data platform that expects to utilize user-generated data for training AI models.
Furthermore, it will become a new norm for higher-education institutions to educate all students on designing these features for fair, responsible and trusted AI systems [14].
AI systems will positively influence human life in the near future, but we will need to complement AI with AI to enhance user trust; create responsible, fair and open prediction models; and update user data collection and consent policies. Users will likely lose choice when interacting with AI-enabled systems, and as a result, there will be a necessity for developing guardrails around models. As mentioned in “Her,” AI models will be “based on millions of personalities” – we have to make sure that we develop AI systems to audit the models for having those millions of personalities and not lose some over time. With rich data, a diverse set of programmers, curious social scientists and thorough policymakers, everyone will have equitable access to a beautiful life supported by fair and caring AI.
References and Notes
- Garg, R., 2022, “AI-Enabled Future of Work,” OR/MS Today, November 29, https://doi.org/10.1287/orms.2022.06.03.
- Moore, G. A., & McKenna, R., 1999, “Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers,” New York: Harper.
- https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
- https://www.cnbc.com/2017/04/24/netflix-shares-rise-after-video-streamer-hits-100-million-subscriber-milestone.html
- https://goizueta.emory.edu/research-spotlight/expert-insight-voice-alexa-how-speech-characteristics-impact-consumer-decisions
- https://www.statista.com/chart/26039/most-popular-laptop-brands-us/
- https://www.gartner.com/en/newsroom/press-releases/2023-01-11-gartner-says-worldwide-pc-shipments-declined-28-percent-in-fourth-quarter-of-2022-and-16-percent-for-the-year
- https://developers.google.com/search/blog/2023/05/understanding-news-topic-authority
- At present, a webpage can opt to exclude that page from being indexed by search engines, e.g., https://developers.google.com/search/docs/crawling-indexing/block-indexing.
- https://www.wired.com/story/wikipedia-state-sponsored-disinformation/
- https://gdpr.eu/gdpr-consent-requirements/
- Wang, C., Wang, K., Bian, A. Y., Islam, R., Keya, K. N., Foulds, J., & Pan, S., 2023, “When Biased Humans Meet Debiased AI: A Case Study in College Major Recommendation,” ACM Transactions on Interactive Intelligent Systems, Vol. 13, No. 3, pp. 1-28.
- Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., et al., 2019, “AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias,” IBM Journal of Research and Development, Vol. 63, No. 4/5, p. 4-1.
- At Emory University, Provost Ravi Bellamkonda has already started this with an AI.Humanity initiative (https://aihumanity.emory.edu/).
Rajiv Garg is an associate professor of information systems and operations management in the Goizueta Business School at Emory University. His research explores the economic and social implications of digital technologies and human-machine interactions. His research has attracted over $1 million in funding and has been published in journals such as Management Science, Information Systems Research, MIS Quarterly and Production and Operations Management.
