Cartesia Integration
A guide on how to integrate and use Cartesia TTS in your realtime Voice Bot using Outspeed.
Ensure you have completed the Quick Start - Simple Voice Bot guide before proceeding with this guide.
Integrating Cartesia TTS into Your Voice Bot
In this guide, we will enhance your basic Voice Bot by integrating Cartesia TTS (Text-to-Speech) to convert text responses into natural-sounding speech. This integration provides a more engaging and interactive user experience.
- The full code is available here.
Prerequisites
Before integrating Cartesia TTS, ensure you have the following:
- API Keys:
- Cartesia - For text-to-speech. Sign up and navigate to Cartesia API Keys to obtain your API key.
- Additionally, ensure you have API keys for Deepgram and Groq as outlined in the Quick Start guide.
Setup Environment Variables
Create a .env
file in the same directory as voice_bot.py
and add your API keys:
Understanding the CartesiaTTS Plugin
The CartesiaTTS
plugin is a powerful component that enables text-to-speech synthesis using the Cartesia API. Let’s explore its key features and functionality:
Initialization
The plugin is initialized with several parameters:
api_key
: Your Cartesia API key (can be set via environment variableCARTESIA_API_KEY
)voice_id
: The ID of the voice to use for synthesismodel
: The TTS model to use (default is “sonic-english”)output_encoding
: The audio encoding format (default is “pcm_s16le”)output_sample_rate
: The sample rate of the output audio (default is 16000 Hz)stream
: Whether to stream the audio output (default is True)
Key Features
-
WebSocket Communication: The plugin establishes a WebSocket connection with the Cartesia API for real-time text-to-speech synthesis.
-
Streaming Support: It supports streaming audio output, allowing for low-latency speech synthesis.
-
Interrupt Handling: The plugin can handle interruptions (e.g., when the user starts speaking) by cancelling ongoing TTS generation and clearing queues.
-
Asynchronous Operation: The plugin operates asynchronously, efficiently managing text input and audio output streams.
-
Tracing and Logging: It includes tracing and logging functionality for monitoring performance and debugging.
Usage
To use the CartesiaTTS plugin in your Voice Bot:
- Initialize the plugin in your
setup
method:
- In your
run
method, connect the text input to the TTS node:
This integration allows your Voice Bot to convert text responses into natural-sounding speech, enhancing the interactive experience for users.
Updating voice_bot.py
to Use Cartesia TTS
Ensure your voice_bot.py
includes the Cartesia TTS integration as shown below:
Was this page helpful?