Get Started
A realtime voice bot that uses OpenAI Realtime API in Python.
Make sure you have gone over QuickStart before trying this example.
In this example, we will build a realtime voice bot that uses OpenAI Realtime API. We accomplish this by using microphone audio to interact with the bot.
Setup Application with Realtime API in 4 steps
This app will process input from your microphone, send it to an LLM, and convert the response back to voice.
- The full code is available here.
Install Dependencies
Create VoiceBot Application (using Realtime API)
Go over to QuickStart to understand the structure of an Outspeed application
Create a file voice_bot_rt.py
and add the following code:
Setup API Keys and Run
You will need the following environment variable:
OPENAI_API_KEY
- You can get this by visiting the OpenAI Realtime API documentation and navigating to the API keys section to generate a new key.
Once you have your key, create a .env
file in the same directory as voice_bot_with_rt.py
and add the following:
Finally, run the following command to start the server:
The console will output the URL you can use to connect to the server (default is http://localhost:8080).
Run Demo
You can use our playground to interact with the voice bot.
- Navigate to playground and select “Voice Bot”.
- Paste the link your received from the previous step into the URL field.
- Select Audio device. Leave Video device blank. Click Run to begin.
The playground is built using our our React SDK. You can use it to build your own frontends, or integrate with an existing one!
Understanding the Process
Review the code in voice_bot_with_rt.py
.
Setup
The VoiceBot
class initializes with the setup
method, which is automatically called when the application starts. This method is responsible for setting up the necessary services and loading models. Here’s a breakdown of the services initialized:
- OpenAIRealtime: Processes audio and text inputs to generate insightful commentary using OpenAI’s Realtime API, guided by a detailed system prompt. It operates with a response temperature of 0.9 and maintains a chat history for context.
Streaming Endpoint
The run
method in the VoiceBot
class is marked as a streaming endpoint, handling real-time audio and text streams.
The method outputs two streams: the audio stream of the commentary and the chat history text stream.
Next Step
Now, we’ll setup a search tool for the LLM in our application.
Support
For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!
Was this page helpful?