October 8, 2024 in LLMs
Enhancing AI Reliability: The Power of Context in Large Language Models
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2024.04.09
A 2024 McKinsey Global Survey [1] reported that nearly 65% of organizations are using artificial intelligence (AI) within their business operations in some way. The survey showed that the number of organizations using AI in their enterprise operations has doubled since 2023, specifically because of hyper-speed advancements and proliferation of generative AI technology within the last two years and the ease of accessibility to general-purpose AI models. Generative AI is remaking businesses and organizational operations, rapidly transforming them across multiple industries and sectors such as healthcare, financial services and supply chain, to name a few.
The reliability and trustworthiness of generative AI systems are concerns that have been extensively studied and reported. Although training or fine-tuning a large language model (LLM) often yields better results for domain-specific use cases, it also requires significant investments of capital, time and AI skill sets. Alternatively, organizations are increasingly opting for generally available pretrained LLMs, often offered by AI model providers that are easily accessible via APIs (application programming interfaces). Although this shortens the time to experimentation and implementation of AI systems, general-purpose models are often less accurate and trustworthy for domain-specific use cases. General-purpose LLMs are powerful and intelligent, capable of generating humanlike outputs when given prompts, and can even perform actions via intelligent agents. But LLMs also have risks and weaknesses that are well known among their users. They can produce biased, harmful or simply nonsensical and factually incorrect information. However, accuracy improvements have been studied via promising techniques such as context-aware grounding [2], which is found to effectively mitigate reliability concerns. This approach is especially relevant to LLMs, which are at the core of today’s AI applications.
The Challenge of AI Reliability
Business leaders and decision-makers face a real conundrum: On one hand, the potential of LLMs to drive innovation and efficiency is beyond question. On the other hand, we must deploy them with care, lest we incur substantial risks – from reputational harm to tricky legal and ethical headaches – that come with using unreliable AI systems. LLMs are often trained on a large corpus of open web text, which makes them effective in generating content for generalized tasks such as content generation, but they often struggle with tasks that are more domain-specific such as question answering or decision-making in the financial services domain. This is acutely because niche or proprietary domain-specific information is unlikely to have been present in the AI model’s training data set. The absence of such domain-specific information in LLMs’ world knowledge often makes them susceptible to hallucinations or confabulations, which means when presented with such a task, LLMs are found to generate completely fabricated responses that are counterfactual.
Several studies have demonstrated that LLMs suffer from chronic inaccuracies when it comes to specific tasks. For instance, a 2023 study [3] found that BART and PEGASUS models produced factual inconsistencies in 30% and 27% of summaries generated, respectively. More recently, a study conducted by Stanford University [4] found that LLMs produced incorrect or false information between 69% and 88% of the time for legal use cases. These chronic accuracy issues potentially lead to diminishing trust in generative AI and a pullback in investments in AI projects. A 2024 Generative AI Global Benchmark Study [5] found that only 63% of organizations have road maps on increased AI spending as compared with 93% in 2023. Low success rates of AI projects stem from issues in reliability, accuracy and explainability of AI output.
The good news is that there are a number of emerging possible solutions to mitigate this issue. The concept of context-aware grounding holds much promise for making LLMs more reliable and ethically aligned and could also provide them with better business sense aligned to the business domain. Grounding LLMs in the real world with relevant domain-specific data helps avoid many potential pitfalls and allow organizations to use LLMs in innovative and efficient ways without necessarily risking brand reputation or business ethics.
Context-Aware Grounding
Two of the most common techniques for performing context-aware grounding include retrieval-augmented generation (RAG) [6] and graph retrieval-augmented generation (GraphRAG) [7].
Retrieval-Augmented Generation (RAG). This method combines the strengths of information retrieval and text generation. The retrieval component is used to fetch relevant data or documents, often from reliable data sources, which are then used as context for the LLM to generate responses, helping to ensure these responses are grounded in factual data. The retrieval component often works in conjunction with similarity and semantic search engines that can perform searches with the user input on a vector database, which stores embedded representations of the domain-specific organizational data. Figure 1 depicts a high-level RAG architecture.
Graph Retrieval-Augmented Generation (GraphRAG). This is a relatively recent development and an extension of the RAG technique, which incorporates knowledge graphs into the retrieval process. Rather than performing a similarity or semantic search on a vector database, GraphRAG uses graph databases to retrieve complex relationships of related documents or texts with the user’s input. This knowledge graph then serves as the full grounded context for the LLM for a given task to generate more accurate answers. This approach allows the LLM to access both high-level summaries and detailed connections within the data, reducing the likelihood of hallucinations or confabulations by grounding the responses in a structured knowledgebase. Figure 2 shows a high-level GraphRAG architecture.
Grounding LLMs in context using either of these methods is a multistep process involving several essential components.
- First, it is necessary to represent the relevant context in such a way that the AI can meaningfully understand it. This entails both the development of sophisticated methods to capture and convert the relevant contextual information into a format that AI models can use, and the rigorous selection of the types of information to be represented. Methods such as similarity search using the Euclidean metric or cosine similarity are common in this space.
- Second, for an AI to be truly grounded in context, the system must also be capable of integrating the relevant data within the LLM context so that it can produce outputs that are not only correct but also contextually appropriate. It is also important to be aware of the LLM’s context window, which may pose limitations on how much relevant information can be accommodated meaningfully into the model’s input.
Context-aware grounding offers some clear advantages. For one, it enhances reliability and explainability because the responses can be directly traced back to the context the model was provided. Cleaner and more meaningful domain-specific context means fewer inappropriate or simply wrong outputs. Ethically, the context-aware AI is more aligned with what an organization wants and what their stakeholders expect. Trust is a huge factor, and the more reliable the AI system, the more trust it garners from stakeholders, including customers, employees, investors and regulators within an industry. Finally, these context-aware AI systems are more versatile and adaptable to changing business landscapes and customizable to fit multiple domains.
Implementing Context-Aware Generative AI Systems
Although the technical details of context-aware grounding can be intricate, business leaders can and should take definite steps to implement it in their organizations. Here are some of the fundamental things to consider while evaluating a domain-specific contextually grounded generative AI system implementation.
Assess your AI strategy. Evaluate both current and planned AI implementations. Identify where in the organization enhanced contextual understanding could yield the most significant benefits.
Invest in data quality. High-quality data sets are as vital as good algorithms for AI success. The richness and diversity of the data – and its contextual integrity – determine the likelihood that context-aware AI will effectively solve the business problems.
Collaborate with AI and domain experts. To implement context-aware generative AI systems, collaborate with domain and AI experts – organizations’ in-house specialists or those brought in from outside the organization. Lead with the ideal business outcomes, and work backward to achieving them with generative AI technology.
Develop ethical guidelines for AI. The fundamental ethical components of context-aware systems can begin with what should be done (or not done) with AI, which should be a conversation for every organization planning to implement AI.
Ensure trustworthiness through rigorous testing and monitoring. Establish thorough testing protocols to guarantee that AI systems are performing as expected in various contexts. Follow up with continuous monitoring so any problems that come up can be immediately addressed. Transparency may be even more important than trust itself. Trustworthy AI is safe, reliable and aligned with human values. Most importantly, it is transparent – a big step toward making an AI system that earns its user’s trust and achieves the desired business outcomes.
Conclusion
As generative AI continues to evolve and integrate more deeply into business processes and operations, ensuring its reliability, safety and ethical alignment has become paramount. Context-aware grounding represents a significant step forward in addressing these challenges, offering a path to more trustworthy and effective AI systems. Importantly, this approach enhances the transparency and explainability of generative AI applications. By anchoring AI outputs in well-defined contextual frameworks, organizations can better understand and articulate how their AI systems arrive at specific decisions or generate a specific output. This increased transparency is crucial for stakeholders – from customers and employees to regulators and investors – who demand accountability in AI-driven processes. It builds trust, facilitates compliance with emerging AI regulations and enables more informed decision-making at all levels of the organization. Additionally, it allows organizations to implement highly domain-specific intelligent systems that are versatile and adaptive to the ever-evolving business landscape.
References
- Quantum Black AI, 2024, “The state of AI in early 2024: Gen AI adoption spikes and starts to generate value,” May 30, McKinsey, https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai.
- Talukdar, Wrick, and Anjanava Biswas, 2023, “Improving large language model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity,” World Journal of Advanced Engineering Technology and Sciences, Vol. 10, No. 2, pp. 283-296.
- Zhang, Yue, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, et al., 2023, “Siren’s song in the AI ocean: A survey on hallucination in large language models,” https://arxiv.org/abs/2309.01219.
- Dahl, Matthew, Varun Magesh, Mirac Suzgun, and Daniel E. Ho, 2024, “Large legal fictions: Profiling legal hallucinations in large language models,” https://arxiv.org/abs/2401.01301.
- LucidWorks, 2024, “2024 State of Generative AI in Global Business,” https://tinyurl.com/yckxr5jz.
- Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, et al., 2020, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, Vol. 33, pp. 9459-9474.
- Edge, Darren, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson, 2024, “From local to global: A graph rag approach to query-focused summarization,” https://arxiv.org/abs/2404.16130.
Anjanava Biswas is an award-winning senior AI specialist solutions architect at Amazon Web Services (AWS), a public speaker, and author with more than 16 years of experience in enterprise architecture, cloud systems and transformation strategy. He is dedicated to artificial intelligence, machine learning and generative AI research, development and innovation projects for the past seven years, working closely with organizations from the healthcare, financial services, technology startup and public sector industries. Biswas holds a Bachelor of Technology degree in information technology and computer science and is a TOGAF certified enterprise architect. He also holds seven AWS Certifications. Biswas is an INFORMS member, Senior IEEE member and a fellow at IET (UK), BCS (UK) and IETE (India). Connect with Anjanava Biswas on LinkedIn.
([email protected])