Quick Start - Simple Voice Bot
What is Outspeed?
Outspeed is a platform for realtime voice and video AI applications. The Outspeed SDK provides a simple and intuitive interface for building real-time voice and video AI applications. It abstracts away the complexities of handling multimedia inputs, allowing developers to focus on creating powerful multimodal AI experiences.
Key features of the Outspeed SDK include:
- Easy-to-use API inspired by PyTorch
- Built-in support for processing voice and video inputs
- Seamless integration with various AI models and services
- Real-time processing capabilities for responsive applications
Now, let’s get started!
Demo
In this quick start guide, we’ll demonstrate how to build a voice bot using Outspeed. This bot can engage in real-time conversations and respond based on the LLM prompt.
Below, you’ll find a brief video demonstration showcasing the capabilities of the voice bot: (PS: Adapt was our previous name 😅)
Prerequisites
Make sure to have the following installed:
- Python: version 3.9 or higher, but less than 3.13
- pip: latest version (comes with Python)
- Git: for cloning the repository
Creating a Voice Bot in 4 steps!
This app will process input from your microphone or chatbox, send it to an LLM and convert the reponse back to voice.
This application only supports English as the source language. For other languages, switch out the models/configs used in this example.
Install Dependencies
Create VoiceBot application
To create a simple voice bot, let’s setup a file voice_bot.py
.
Next, we’ll create an application class with a streaming endpoint. This endpoint will:
- Accept
sp.AudioStream
andsp.TextStream
as inputs - Respond with an
sp.AudioStream
To create a real-time application in Outspeed:
- Define a Python class
- Annotate it with
@sp.App()
- Implement these three key methods:
Method | Purpose |
---|---|
setup() | Initialize AI services and resources |
run() | Define the main processing pipeline (use @sp.streaming_endpoint() ) |
teardown() | Clean up resources when the application stops |
Now let’s break down each method:
- Setup method:
The setup
method initializes all the necessary AI services:
DeepgramSTT
for speech-to-text conversionGroqLLM
for language model processingTokenAggregator
for aggregating tokensCartesiaTTS
for text-to-speech conversion
- Run method:
The run
method sets up the AI service pipeline:
- Converts audio input to text using Deepgram
- Processes any text input
- Merges speech-to-text and direct text inputs
- Runs the merged input through the LLM
- Aggregates LLM output tokens for improved TTS synthesis
- Converts the aggregated text to speech
- Returns the final audio stream
- Teardown method:
The teardown
method ensures proper cleanup of resources:
- Closes all initialized AI services (Deepgram, LLM, TokenAggregator, TTS)
- This method is called when the app stops or shuts down unexpectedly
To view the full voice_bot.py
code, navigate to the following link:
View the complete voice_bot.py code
Setup API Keys and Run
To run this example locally, you’ll need API keys setup in the environment variables for the following services:
- Deepgram - For transcription. Sign up and navigate to https://console.deepgram.com/ to get the API key.
- Groq - For LLM. Sign up and navigate to https://console.groq.com/keys and to get the API key.
- Cartesia - For text-to-speech. Sign up and navigate to https://play.cartesia.ai/keys to get the API key.
All of these providers have a free tier. Once you have your keys, create a .env
file in the same directory as voice_bot.py
and add the following:
Finally, run the following command to start the server:
The console will output the URL you can use to connect to the (default is http://localhost:8080).
Try it Out
You can use our playground to interact with the voice bot.
- Navigate to playground and select “Voice Bot”
- Paste the link your received from the previous step into the URL field.
- Select Audio device. Leave Video device blank. Click Run to begin.
The playground is built using our our React SDK. You can use it to build your own frontends, or integrate with an existing one!
Support
For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!
Was this page helpful?