Can a technology called RAG stop AI models from inventing things?

Aurich Lawson | fake images

We’ve been experiencing the generative AI boom for almost a year and a half, following the late 2022 launch of OpenAI’s ChatGPT. But despite the transformative effects on companies’ stock prices, generative AI tools powered by large language models (LLMs) still have significant drawbacks that have prevented them from being as useful as many would like. Recovery Augmented Generation, or RAG, aims to address some of those drawbacks.

Perhaps the most notable drawback of LLMs is their tendency toward confabulation (also called “hallucination”), which is a creative gap-filling technique that AI language models use when they find gaps in their knowledge that were not present in their training data. They generate plausible-sounding text that may drift toward accuracy when the training data is solid but may otherwise be completely made up.

Relying on the collusion of AI models gets people and companies into trouble, as we have discussed in the past. In 2023, we saw two cases of lawyers citing legal cases, invented by AI, that did not exist. We covered claims against OpenAI where ChatGPT colluded and accused innocent people of doing terrible things. In February, we wrote about the Air Canada customer service chatbot inventing a refund policy, and in March, a New York City chatbot was caught colluding with city regulations.

So if generative AI is to be the technology that propels humanity into the future, someone needs to fix the fabulation problems along the way. That’s where RAG comes in. Proponents hope the technique will help turn generative AI technology into reliable assistants that can boost productivity without requiring a human to double-check or guess at answers.

“RAG is a way to improve LLM performance, essentially combining the LLM process with a web search or other document search process” to help LLMs stick to the facts, according to Noah Giansiracusa, associate professor of mathematics at Bentley University.

Let’s take a closer look at how it works and what its limitations are.

A framework to improve AI accuracy

Although RAG is now considered a technique to help solve problems with generative AI, it actually predates ChatGPT. Researchers coined the term in a 2020 academic paper by researchers at Facebook AI Research (FAIR, now Meta AI Research), University College London, and New York University.

As we’ve mentioned, LLMs struggle with facts. Google’s entry into the generative AI race, Bard, made an embarrassing mistake in the first public demonstration of it back in February 2023 on the James Webb Space Telescope. The mistake wiped about $100 billion off the value of parent company Alphabet. LLMs produce the statistically most likely answer based on their training data and understand nothing of what they produce, meaning they can present false information that appears accurate if you don’t have expert knowledge on a topic.

LLMs also lack up-to-date knowledge and the ability to identify gaps in their knowledge. “When a human tries to answer a question, he can rely on his memory and get an answer on the fly, or he can do something like search Google or read Wikipedia and then try to piece together an answer from what he finds there. . —still filtering that information through their inside knowledge of the matter,” Giansiracusa said.

But LLMs are not human, of course. Your training data can get old quickly, particularly in more urgent queries. Furthermore, the LLM often cannot distinguish specific sources of its knowledge, since all its training data is mixed together in a kind of soup.

In theory, RAG should make keeping AI models up to date much cheaper and easier. “The nice thing about RAG is that when new information is available, instead of having to retrain the model, all that is needed is to augment the model’s external knowledge base with the updated information,” Peterson said. “This reduces LLM development time and cost while improving model scalability.”

Leave a Comment