Retrieval Augmented Generation and Split

Blog > Code

June 5, 2024

Retrieval Augmented Generation (RAG) is a technique used in Generative AI to be able to enhance a generative AI model with text retrieved from an additional source of information. This is typically used in situations where there is a domain specific information resource that is needed to augment the generative AI’s model, such as a company knowledge base or integrating data from proprietary APIs or data sets in order to enhance the accuracy and value provided by the AI.

The key components of this are the retrieval system and the generative model. They need to work together in order to successfully generate the augmented result from the query entered by the user.

How RAG Works

At a very high level, you can think of RAG working as a three step process. The first step is when the user inputs their query to the Generative AI system using RAG. After that, the retrieval system then needs to process this information to fetch the relevant information from the knowledge base or data set that it is supposed to use as ‘the truth’ for use in the Generative model. Then, after that, the generative model takes the fetched information and builds it into an answer.

OpenAI offers the ability to tell the GPT model to use custom functions that allow it to retrieve data on the fly. These would be used for enriching requests when the additional data to integrate with the GPT comes from private API data, as that is not necessarily static over long periods of time.

What is currently popular for storing and retrieving text similarity is to use vector databases for this purpose. Chunks of textual information are vectorized and then stored in a vector database. The vectorization process for documents used in a RAG pipeline is often referred to as creating ‘embeddings’ for them. This is the style of RAG we will be demonstrating in this article

Integrating information from the knowledge base or other data set has benefits for generative models. It allows improved accuracy and contextual relevance that would not be possible for a generative model to build with its training data alone. This is a quite common use case in the enterprise when you need to reduce or eliminate AI hallucinations and base answers on an existing, limited corpus of factual information.

An Example

Here’s an example of how a RAG might work. In this example we have an application that allows users to ask questions about great books in history. In this specific example we’ll source from Mark Twain’s ‘The Adventures of Tom Sawyer’

Python:

Feature Delivery & Control

Feature Measurement & Learning

Enterprise Readiness

Related Links

By Need

By Industry

Developer Resources

Content Hub

Success

Related Links

Retrieval Augmented Generation and Split

Contents

How RAG Works

An Example

Feature Measurement and Rollouts

Creating the Feature Flags in Split

Creating the Flag Rules

Putting the Flags in Code

Putting it all together

Where to go from here

Switch It On With Split

Get Split Certified

Want to Dive Deeper?

Release New Features Faster

Want to Dive Deeper?

Create Impact With Everything You Build