Quick Start - Simple Voice Bot

What is Outspeed?

Outspeed is a platform for realtime voice and video AI applications. The Outspeed SDK provides a simple and intuitive interface for building real-time voice and video AI applications. It abstracts away the complexities of handling multimedia inputs, allowing developers to focus on creating powerful multimodal AI experiences.

Key features of the Outspeed SDK include:

Easy-to-use API inspired by PyTorch
Built-in support for processing voice and video inputs
Seamless integration with various AI models and services
Real-time processing capabilities for responsive applications

Now, let’s get started!

Demo

In this quick start guide, we’ll demonstrate how to build a voice bot using Outspeed. This bot can engage in real-time conversations and respond based on the LLM prompt.

Below, you’ll find a brief video demonstration showcasing the capabilities of the voice bot: (PS: Adapt was our previous name 😅)

Prerequisites

Make sure to have the following installed:

Python: version 3.9 or higher, but less than 3.13
pip: latest version (comes with Python)
Git: for cloning the repository

Creating a Voice Bot in 4 steps!

This app will process input from your microphone or chatbox, send it to an LLM and convert the reponse back to voice.

This application only supports English as the source language. For other languages, switch out the models/configs used in this example.

Install Dependencies

Create VoiceBot application

To create a simple voice bot, let’s setup a file voice_bot.py.

# voice_bot.py
import outspeed as sp

Next, we’ll create an application class with a streaming endpoint. This endpoint will:

Accept sp.AudioStream and sp.TextStream as inputs
Respond with an sp.AudioStream

To create a real-time application in Outspeed:

Define a Python class
Annotate it with @sp.App()
Implement these three key methods:

Method	Purpose
`setup()`	Initialize AI services and resources
`run()`	Define the main processing pipeline (use `@sp.streaming_endpoint()`)
`teardown()`	Clean up resources when the application stops

@sp.App()
class VoiceBot:
  async def setup(self) -> None:
    # Initialize the AI services
    # (Code for setup will be shown in the next snippet)

  @sp.streaming_endpoint()
  async def run(self, audio_input_queue: sp.AudioStream) -> sp.AudioStream:
    # Handle the main processing loop for the VoiceBot
    # (Code for run will be shown in a separate snippet)

  async def teardown(self) -> None:
    # Clean up resources when the VoiceBot is shutting down
    # (Code for teardown will be shown in the final snippet)

Now let’s break down each method:

Setup method:

async def setup(self) -> None:
  # Initialize the AI services
  self.deepgram_node = sp.DeepgramSTT()
  self.llm_node = sp.GroqLLM(
      system_prompt="You are a helpful assistant. Keep your answers very short. No special characters in responses.",
  )
  self.token_aggregator_node = sp.TokenAggregator()
  self.tts_node = sp.CartesiaTTS(
      voice_id="95856005-0332-41b0-935f-352e296aa0df",
  )

The setup method initializes all the necessary AI services:

DeepgramSTT for speech-to-text conversion
GroqLLM for language model processing
TokenAggregator for aggregating tokens
CartesiaTTS for text-to-speech conversion

Run method:

@sp.streaming_endpoint()
async def run(self, audio_input_queue: sp.AudioStream) -> sp.AudioStream:
  deepgram_stream: sp.TextStream = self.deepgram_node.run(audio_input_queue)

  llm_token_stream, chat_history_stream = self.llm_node.run(deepgram_stream)
  token_aggregator_stream: sp.TextStream = self.token_aggregator_node.run(llm_token_stream)

  tts_stream: sp.AudioStream = self.tts_node.run(token_aggregator_stream)

  return tts_stream

The run method sets up the AI service pipeline:

Converts audio input to text using Deepgram
Processes any text input
Merges speech-to-text and direct text inputs
Runs the merged input through the LLM
Aggregates LLM output tokens for improved TTS synthesis
Converts the aggregated text to speech
Returns the final audio stream

Teardown method:

async def teardown(self) -> None:
  await self.deepgram_node.close()
  await self.llm_node.close()
  await self.token_aggregator_node.close()
  await self.tts_node.close()

if __name__ == "__main__":
  VoiceBot().start()

The teardown method ensures proper cleanup of resources:

Closes all initialized AI services (Deepgram, LLM, TokenAggregator, TTS)
This method is called when the app stops or shuts down unexpectedly

To view the full voice_bot.py code, navigate to the following link: View the complete voice_bot.py code

Setup API Keys and Run

To run this example locally, you’ll need API keys setup in the environment variables for the following services:

Deepgram - For transcription. Sign up and navigate to https://console.deepgram.com/ to get the API key.
Groq - For LLM. Sign up and navigate to https://console.groq.com/keys and to get the API key.
Cartesia - For text-to-speech. Sign up and navigate to https://play.cartesia.ai/keys to get the API key.

All of these providers have a free tier. Once you have your keys, create a .env file in the same directory as voice_bot.py and add the following:

DEEPGRAM_API_KEY=<your_deepgram_api_key>
GROQ_API_KEY=<your_groq_api_key>
CARTESIA_API_KEY=<your_cartesia_api_key>

Finally, run the following command to start the server:

# in examples/voice_bot
python voice_bot.py

The console will output the URL you can use to connect to the (default is http://localhost:8080).

Try it Out

You can use our playground to interact with the voice bot.

Navigate to playground and select “Voice Bot”
Paste the link your received from the previous step into the URL field.
Select Audio device. Leave Video device blank. Click Run to begin.

The playground is built using our our React SDK. You can use it to build your own frontends, or integrate with an existing one!

Support

For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!

Get Started

Multilingual Voice Bot

OpenAI Realtime API - Python

Function Calling and Tools

General

Quick Start - Simple Voice Bot

What is Outspeed?

Demo

Prerequisites

Creating a Voice Bot in 4 steps!

Support

Get Started

Multilingual Voice Bot

OpenAI Realtime API - Python

Function Calling and Tools

General

​What is Outspeed?

​Demo

​Prerequisites

​Creating a Voice Bot in 4 steps!

​Support

What is Outspeed?

Demo

Prerequisites

Creating a Voice Bot in 4 steps!

Support