RAG (LLamaIndex)
A guide on integrating Retrieval-Augmented Generation (RAG) into your realtime Voice Bot.
This example currently only works on your local machine. It doesn’t work with Outspeed Cloud.
In this guide, we will extend the basic Voice Bot by integrating Retrieval-Augmented Generation (RAG). This enhancement allows your bot to provide more informed and contextually accurate responses by retrieving relevant information from a structured knowledge base in real-time.
Integrating RAG into Your Realtime API Voice Bot
We will modify the existing Voice Bot to include RAG functionality, leveraging the llama_index
library for indexing and querying documents. This involves setting up the data index, creating a search tool, and integrating it into the bot’s AI service pipeline.
- The full code is available here.
Understanding Llama Index
To enable RAG, we need to index our data sources. We use Llama Index (llama_index
) for this purpose.
Load and Parse Documents
First, load your documents from the data directory and parse them into nodes.
Creating the Vector Store Index
Next, create a vector store index from the parsed nodes. This index enables efficient similarity-based queries.
Defining the Search Tool
Define a custom search tool that uses the query engine to retrieve relevant information based on user queries.
Integrating RAG into the VoiceBot
Incorporate the RAGTool
into the OpenAILLM
AI service pipeline within the VoiceBot
setup.
Setting Up the Streaming Endpoint
Set up the streaming endpoint to handle incoming audio and text streams, and configure the AI service pipeline.
Understanding the Integration
RAGTool Class
The RAGTool
class extends sp.Tool
and is responsible for interacting with the query engine to perform searches based on user inputs.
- Constructor (
__init__
Method): Initializes the tool by loading documents, parsing them into nodes, and creating a vector store index for efficient querying. run
Method: This asynchronous method takes aQuery
object, uses the query engine to retrieve relevant information, and returns aRAGResult
object containing the search outcome.
VoiceBot Setup
In the setup
method of the VoiceBot
class, we initialize various AI services:
- DeepgramSTT: For speech-to-text conversion.
- OpenAILLM: The language model with the
RAGTool
integrated. - TokenAggregator: For aggregating tokens from the LLM output.
- CartesiaTTS: For text-to-speech conversion.
- SileroVAD: For voice activity detection.
Streaming Endpoint (run
Method)
The run
method sets up the AI service pipeline by:
- Converting audio input to text using Deepgram.
- Detecting voice activity.
- Merging text input with transcribed audio.
- Processing the input through the LLM with RAG capabilities.
- Aggregating tokens and converting the response to speech.
- Setting up interrupt streams for a more responsive experience.
Support
For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!
Was this page helpful?