RAG
Businesses considering the integration of generative AI models face a significant challenge with hallucinations, which are essentially false information produced by these models. Since these models lack real intelligence and operate based on predictive algorithms, they can occasionally generate inaccurate outputs. An example highlighted in The Wall Street Journal illustrates how Microsoft’s generative AI fabricated meeting attendees and misrepresented conference call topics.
As discussed previously, addressing hallucinations poses a formidable challenge with current transformer-based model architectures. However, some generative AI vendors propose a potential solution called retrieval augmented generation (RAG), which aims to mitigate these errors.
Here’s how one vendor, Squirro, pitches it:
“The essence of our offering revolves around Retrieval Augmented Language Models (LLMs) or Retrieval Augmented Generation (RAG) embedded within our solution. Our generative AI stands out with its commitment to zero hallucinations. All generated content is traceable to its source, guaranteeing credibility.
Here’s a similar pitch from SiftHub:
SiftHub leverages RAG technology and fine-tuned large language models with industry-specific knowledge training to enable companies to produce tailored responses devoid of hallucinations. This ensures heightened transparency, minimized risk, and fosters unwavering trust in utilizing AI for diverse requirements.
RAG, pioneered by data scientist Patrick Lewis from Meta and University College London, and the primary author of the 2020 paper introducing the concept, retrieves relevant documents for a given query, such as a Wikipedia page about the Super Bowl, to provide additional context for generating responses.
David Wadden, a research scientist at AI2, emphasizes that while generative AI models like ChatGPT or Llama typically answer based on their internal knowledge, RAG improves accuracy by referencing external documents, akin to consulting a book or file for more precise responses.
While RAG offers benefits such as verifying factual accuracy and respecting copyright, it doesn’t entirely eliminate model hallucinations and has inherent limitations that vendors often overlook.”
According to Wadden, RAG excels in “knowledge-intensive” scenarios where users seek information, like who won the Super Bowl last year. In these cases, relevant documents are easily found through keyword searches.
However, “reasoning-intensive” tasks such as coding and math pose challenges as it’s harder to specify concepts in search queries. Models can also get sidetracked by irrelevant content in lengthy documents or ignore retrieved documents altogether.
Implementing RAG at scale is costly due to the need for substantial hardware. Retrieved documents must be stored in memory for the model to reference, and increased compute power is required to process additional context before generating responses. Given the high resource demands of AI technology, this poses significant considerations.
While RAG can be enhanced, ongoing efforts aim to optimize models’ utilization of RAG-retrieved documents. Some initiatives focus on enabling models to determine when to utilize documents or to forgo retrieval if deemed unnecessary. Others concentrate on efficiently indexing large datasets and enhancing document representations beyond keywords.
Improving search techniques for abstract concepts, like mathematical proof techniques, remains a challenge. According to Wadden, research is crucial to develop document representations and search methods capable of identifying relevant documents for abstract tasks, which is an open question.
While RAG can mitigate model hallucinations, it doesn’t resolve all AI-related challenges. Therefore, caution is advised against vendors making exaggerated claims about RAG’s capabilities.