Overview
Generate audio from text using our TTS endpoints. We support both single-voice and multi-voice dialogue generation.- Single Voice TTS: Convert text to speech with one voice
- Dialogue Generation: Mix narrator and character voices in the same audio file
Single Voice TTS
POSThttps://api.outspeed.com/v1/tts/
Request Body
- model: TTS model to use. Use
outspeed-tts-v2(outspeed-tts-v1is deprecated) - voice: the voice identifier. Find all available voices and their models at TTS Playground
- text: the text to synthesize
- stream: set to
trueto stream audio chunks;falsereturns the full WAV
Response
- Content-Type:
audio/pcm - Headers:
X-Sample-Rate: Sample rate (default: 24000)X-Channels: Number of audio channels (default: 1)X-Bit-Depth: Bit depth (default: 16)
- Body: Raw PCM audio bytes (little-endian int16), 24kHz, mono. No WAV header is included. Wrap with a WAV header or convert with a tool like ffmpeg.
Authenticate with
Authorization: Bearer <YOUR_OUTSPEED_API_KEY>.Examples (non-streaming)
Examples (streaming)
Dialogue Generation
Generate audio with multiple voices (narrator + character) using theoutspeed-tts-v2 model.
Try it visually at the Dialogue Playground
Endpoint
POSThttps://api.outspeed.com/v1/tts/dialogue
Request Body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Must be outspeed-tts-v2 |
text | string | Yes | Dialogue text with delimiters for narrator parts |
speaker_voice | string | Yes | Voice ID for character dialogue |
narrator_voice | string | No | Voice ID for narrator parts (omit to skip narrator) |
narrator_delimiter | string | No | Delimiter: *, (, [, or { (default: *) |
Delimiter Usage
- Text in delimiters = narrator voice
- Text outside delimiters = character voice
*→*narrator text*(→(narrator text)[→[narrator text]{→{narrator text}
Examples
Voice Selection
Both voices require voice IDs (not names). Get voice IDs from:- TTS Playground - Browse and copy existing voice IDs
- Voice Upload - Upload custom voices or create voice clones
Response
Same as single voice TTS:- Content-Type:
audio/wav - Headers:
X-Sample-Rate,X-Channels,X-Bit-Depth - Body: Raw PCM audio bytes