Overview
Generate audio from text using our TTS endpoints. We support both single-voice and multi-voice dialogue generation.- Single Voice TTS: Convert text to speech with one voice
- Dialogue Generation: Mix narrator and character voices in the same audio file
Single Voice TTS
POSThttps://api.outspeed.com/v1/tts/
Request Body
- model: TTS model to use. Use
outspeed-tts-v2
(outspeed-tts-v1
is deprecated) - voice: the voice identifier. Find all available voices and their models at TTS Playground
- text: the text to synthesize
- stream: set to
true
to stream audio chunks;false
returns the full WAV
Response
- Content-Type:
audio/pcm
- Headers:
X-Sample-Rate
: Sample rate (default: 24000)X-Channels
: Number of audio channels (default: 1)X-Bit-Depth
: Bit depth (default: 16)
- Body: Raw PCM audio bytes (little-endian int16), 24kHz, mono. No WAV header is included. Wrap with a WAV header or convert with a tool like ffmpeg.
Authenticate with
Authorization: Bearer <YOUR_OUTSPEED_API_KEY>
.Examples (non-streaming)
Examples (streaming)
Dialogue Generation
Generate audio with multiple voices (narrator + character) using theoutspeed-tts-v2
model.
Try it visually at the Dialogue Playground
Endpoint
POSThttps://api.outspeed.com/v1/tts/dialogue
Request Body
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | Must be outspeed-tts-v2 |
text | string | Yes | Dialogue text with delimiters for narrator parts |
speaker_voice | string | Yes | Voice ID for character dialogue |
narrator_voice | string | No | Voice ID for narrator parts (omit to skip narrator) |
narrator_delimiter | string | No | Delimiter: * , ( , [ , or { (default: * ) |
Delimiter Usage
- Text in delimiters = narrator voice
- Text outside delimiters = character voice
*
→*narrator text*
(
→(narrator text)
[
→[narrator text]
{
→{narrator text}
Examples
Voice Selection
Both voices require voice IDs (not names). Get voice IDs from:- TTS Playground - Browse and copy existing voice IDs
- Voice Upload - Upload custom voices or create voice clones
Response
Same as single voice TTS:- Content-Type:
audio/wav
- Headers:
X-Sample-Rate
,X-Channels
,X-Bit-Depth
- Body: Raw PCM audio bytes