DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Leveraging Generative AI for Video Creation: A Deep Dive Into LLaMA
  • Top 3 AI Tools to Supercharge Your Software Development
  • Architecting High-Performance Supercomputers for Tomorrow's Challenges
  • Exploring the Security Risks of Large Language Models

Trending

  • The Future of Agile Roles: The Future of Agility
  • Integrating Salesforce APEX REST
  • An Explanation of Jenkins Architecture
  • Telemetry Pipelines Workshop: Introduction To Fluent Bit
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Reducing AI Hallucinations With Retrieval Augmented Generation

Reducing AI Hallucinations With Retrieval Augmented Generation

This newly devised technique shows promise in increasing the knowledge of LLMs by enabling prompts to be augmented with proprietary data.

By 
Alex Leventer user avatar
Alex Leventer
·
Sep. 22, 23 · Analysis
Like (1)
Save
Tweet
Share
2.4K Views

Join the DZone community and get the full member experience.

Join For Free

In the rapidly evolving world of AI, large language models have come a long way, boasting impressive knowledge of the world around us. Yet LLMs, as intelligent as they are, often struggle to recognize the boundaries of their own knowledge, a shortfall that often leads them to “hallucinate” to fill in the gaps. A newly devised technique, known as Retrieval Augmented Generation (RAG), shows promise in efficiently increasing the knowledge of these LLMs and reducing the impact of hallucination by enabling prompts to be augmented with proprietary data. 

Navigating the Knowledge Gap in LLMs

LLMs are computer models capable of comprehending and generating human-like text. They're the AI behind your digital assistant, autocorrect function, and even some of your emails. Their knowledge of the world is often immense, but it isn't perfect. Just like humans, LLMs can reach the limits of their knowledge, but instead of stopping, they tend to make educated guesses or “hallucinate” to complete the task. This can lead to results that contain inaccurate or misleading information. 

In a simple world, the answer would be to provide the model with relevant proprietary information at the exact time it's needed, right when the query is made. However, determining what information is "relevant" isn’t always straightforward and requires an understanding of what the LLM has been asked to accomplish. This is where RAG comes into play.

The Power of Embedding Models and Vector Similarity Search

Embedding models in the world of AI act like translators. They transform text documents into a large list of numbers through a process known as "document encoding." This list represents the LLM’s internal "understanding" of the document's meaning. This string of numbers is known as a vector: a numeric representation of the attributes of a piece of data. Each data point is represented as a vector with many numerical values, where each value corresponds to a specific feature or attribute of the data. 

While a string of numbers might seem meaningless to the average person, these numbers serve as coordinates in a high-dimensional space. In the same way that latitude and longitude can describe a location in a physical space, this string of numbers describes the original text’s location in semantic space, the space of all possible meanings.

Treating these numbers as coordinates enables us to measure the similarity in meaning between two documents. This measurement is taken as the distance between their respective points in the semantic space. A smaller distance would indicate a greater similarity in meaning, while a larger distance suggests a disparity in content. Consequently, information relevant to a query can be discovered by searching for documents "close to" the query in semantic space. This is the magic of vector similarity search. 

The Power of Embedding Models and Vector Similarity Search

The Idea Behind Retrieval Augmented Generation

RAG is a generative AI architecture that applies semantic similarity to automatically discover information relevant to a query.

In a RAG system, your documents are stored in a vector database (DB). Each document is indexed based on a semantic vector produced by an embedding model so that finding documents close to a given query vector can be done quickly. This essentially means that each document is assigned a numerical representation (the vector), which indicates its meaning.

a RAG system

When a query comes in, the same embedding model is used to produce a semantic vector for the query. 

embedding

The model then retrieves similar documents from the DB using vector search, looking for documents whose vectors are close to the vector of the query. 

query

Once the relevant documents have been retrieved, the query, along with these documents, is used to generate a response from the model. This way, the model doesn't have to rely solely on its internal knowledge but can access whatever data you provide at the right time. The model is, therefore, better equipped to provide more accurate and contextually appropriate responses by incorporating proprietary data stored in a database that offers vector search as a feature. 

There are a handful of so-called “vector databases” available, including DataStax Astra DB, for which vector search is now generally available. The main advantage of a database that enables vector search is speed. Traditional databases have to compare a query to every item in the database. In contrast, integrated vector search enables a form of indexing and includes search algorithms that vastly speed up the process, making it possible to search massive amounts of data in a fraction of the time it would take a standard database.

Gen AI

Fine-tuning can be applied to the query encoder and result generator for optimized performance. Fine-tuning is a process where the model's parameters are slightly adjusted to better adapt to the specific task at hand.

RAG Versus Fine-Tuning

Fine-tuning offers many benefits for optimizing LLMs. But it’s also got some limitations. For one, it doesn't allow for dynamic integration of new or proprietary data. The model's knowledge remains static post-training, leading it to hallucinate when asked about data outside of its training set. RAG, on the other hand, dynamically retrieves and incorporates up-to-date and proprietary data from an external database, mitigating the hallucination issue and providing more contextually accurate responses. RAG gives you query-time control over exactly what information is provided to the model, allowing prompts to be tailored to specific users at the exact time a query is made.

RAG is also more computationally efficient and flexible than fine-tuning. Fine-tuning requires the entire model to be retrained for each dataset update, a time-consuming and resource-intensive task. Conversely, RAG only requires updating the document vectors, enabling easier and more efficient information management. RAG's modular approach also allows for the fine-tuning of the retrieval mechanism separately, permitting adaptation to different tasks or domains without altering the base language model. 

RAG enhances the power and accuracy of large language models, making it a compelling alternative to fine-tuning. In practice, enterprises tend to use RAG more often than fine-tuning.

Changing the Role of LLMs With RAG

Integrating RAG into LLMs doesn’t only improve the accuracy of their responses, but it also maximizes their potential. The process enables LLMs to focus on what they excel at intelligently generating content from a prompt. The model is no longer the sole source of information because RAG provides it with relevant proprietary knowledge when required, and the corpus of knowledge accessible to the model can be expanded and updated without expensive model training jobs.

In essence, RAG acts as a bridge, connecting the LLM to a reservoir of knowledge that goes beyond its internal capabilities. As a result, it drastically reduces the LLM's tendency to “hallucinate” and provides a more accurate and efficient model for users.

AI Language model

Published at DZone with permission of Alex Leventer. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Leveraging Generative AI for Video Creation: A Deep Dive Into LLaMA
  • Top 3 AI Tools to Supercharge Your Software Development
  • Architecting High-Performance Supercomputers for Tomorrow's Challenges
  • Exploring the Security Risks of Large Language Models

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: