This example currently only works on your local machine. It doesn’t work with Outspeed Cloud.

In this guide, we will extend the basic Voice Bot by integrating Retrieval-Augmented Generation (RAG). This enhancement allows your bot to provide more informed and contextually accurate responses by retrieving relevant information from a structured knowledge base in real-time.

Integrating RAG into Your Realtime API Voice Bot

We will modify the existing Voice Bot to include RAG functionality, leveraging the llama_index library for indexing and querying documents. This involves setting up the data index, creating a search tool, and integrating it into the bot’s AI service pipeline.

  • The full code is available here.
1

Understanding Llama Index

To enable RAG, we need to index our data sources. We use Llama Index (llama_index) for this purpose.

Load and Parse Documents

First, load your documents from the data directory and parse them into nodes.

from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser

PARENT_DIR = os.path.dirname(os.path.abspath(__file__))

documents = SimpleDirectoryReader(f"{PARENT_DIR}/data/").load_data()
node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
nodes = node_parser.get_nodes_from_documents(documents=documents)

Creating the Vector Store Index

Next, create a vector store index from the parsed nodes. This index enables efficient similarity-based queries.

from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
self.query_engine = vector_index.as_query_engine(similarity_top_k=2)
2

Defining the Search Tool

Define a custom search tool that uses the query engine to retrieve relevant information based on user queries.

from pydantic import BaseModel
import outspeed as sp
import logging
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SimpleNodeParser

class Query(BaseModel):
    query_for_neural_search: str

class RAGResult(BaseModel):
    result: str

class RAGTool(sp.Tool):
    name = "rag"
    description = "Search the knowledge base for information"
    parameters_type = Query
    response_type = RAGResult

    def __init__(self):
        super().__init__()
        documents = SimpleDirectoryReader(f"{PARENT_DIR}/data/").load_data()
        node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
        nodes = node_parser.get_nodes_from_documents(documents=documents)
        vector_index = VectorStoreIndex(nodes)
        self.query_engine = vector_index.as_query_engine(similarity_top_k=2)

    async def run(self, query: Query) -> RAGResult:
        logging.info(f"Searching for: {query.query_for_neural_search}")
        response = self.query_engine.query(query.query_for_neural_search)
        logging.info(f"RAG Response: {response}")
        return RAGResult(result=str(response))
3

Integrating RAG into the VoiceBot

Incorporate the RAGTool into the OpenAILLM AI service pipeline within the VoiceBot setup.

@sp.App()
class VoiceBot:
    async def setup(self) -> None:
        # Initialize the AI services
        self.deepgram_node = sp.DeepgramSTT()
        self.llm_node = sp.OpenAILLM(
            tool_choice="required",
            tools=[
                RAGTool(),
            ],
        )
        self.token_aggregator_node = sp.TokenAggregator()
        self.tts_node = sp.CartesiaTTS(
            voice_id="95856005-0332-41b0-935f-352e296aa0df",
        )
        self.vad_node = sp.SileroVAD(min_volume=0)
4

Setting Up the Streaming Endpoint

Set up the streaming endpoint to handle incoming audio and text streams, and configure the AI service pipeline.

@sp.streaming_endpoint()
async def run(self, audio_input_queue: sp.AudioStream, text_input_queue: sp.TextStream):
    # Set up the AI service pipeline
    deepgram_stream: sp.TextStream = self.deepgram_node.run(audio_input_queue)

    vad_stream: sp.VADStream = self.vad_node.run(audio_input_queue.clone())

    text_input_queue = sp.map(text_input_queue, lambda x: json.loads(x).get("content"))

    llm_input_queue: sp.TextStream = sp.merge(
        [deepgram_stream, text_input_queue],
    )

    llm_token_stream: sp.TextStream
    chat_history_stream: sp.TextStream
    llm_token_stream, chat_history_stream = self.llm_node.run(llm_input_queue)

    token_aggregator_stream: sp.TextStream = self.token_aggregator_node.run(llm_token_stream)
    tts_stream: sp.AudioStream = self.tts_node.run(token_aggregator_stream)

    self.llm_node.set_interrupt_stream(vad_stream)
    self.token_aggregator_node.set_interrupt_stream(vad_stream.clone())
    self.tts_node.set_interrupt_stream(vad_stream.clone())

    return tts_stream, chat_history_stream

Understanding the Integration

RAGTool Class

The RAGTool class extends sp.Tool and is responsible for interacting with the query engine to perform searches based on user inputs.

  • Constructor (__init__ Method): Initializes the tool by loading documents, parsing them into nodes, and creating a vector store index for efficient querying.
  • run Method: This asynchronous method takes a Query object, uses the query engine to retrieve relevant information, and returns a RAGResult object containing the search outcome.

VoiceBot Setup

In the setup method of the VoiceBot class, we initialize various AI services:

  1. DeepgramSTT: For speech-to-text conversion.
  2. OpenAILLM: The language model with the RAGTool integrated.
  3. TokenAggregator: For aggregating tokens from the LLM output.
  4. CartesiaTTS: For text-to-speech conversion.
  5. SileroVAD: For voice activity detection.

Streaming Endpoint (run Method)

The run method sets up the AI service pipeline by:

  1. Converting audio input to text using Deepgram.
  2. Detecting voice activity.
  3. Merging text input with transcribed audio.
  4. Processing the input through the LLM with RAG capabilities.
  5. Aggregating tokens and converting the response to speech.
  6. Setting up interrupt streams for a more responsive experience.

Support

For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!