Skip to main content

Overview

Generate audio from text using our TTS endpoints. We support both single-voice and multi-voice dialogue generation.
  • Single Voice TTS: Convert text to speech with one voice
  • Dialogue Generation: Mix narrator and character voices in the same audio file

Single Voice TTS

POST https://api.outspeed.com/v1/tts/

Request Body

{
  "model": "outspeed-tts-v2",
  "voice": "clark",
  "text": "Hello, world!",
  "stream": false
}
  • model: TTS model to use. Use outspeed-tts-v2 (outspeed-tts-v1 is deprecated)
  • voice: the voice identifier. Find all available voices and their models at TTS Playground
  • text: the text to synthesize
  • stream: set to true to stream audio chunks; false returns the full WAV

Response

  • Content-Type: audio/pcm
  • Headers:
    • X-Sample-Rate: Sample rate (default: 24000)
    • X-Channels: Number of audio channels (default: 1)
    • X-Bit-Depth: Bit depth (default: 16)
  • Body: Raw PCM audio bytes (little-endian int16), 24kHz, mono. No WAV header is included. Wrap with a WAV header or convert with a tool like ffmpeg.
Authenticate with Authorization: Bearer <YOUR_OUTSPEED_API_KEY>.

Examples (non-streaming)

curl \
  -X POST \
  'https://api.outspeed.com/v1/tts/' \
  -H 'Authorization: Bearer YOUR_OUTSPEED_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"model":"outspeed-tts-v2","voice":"clark","text":"Hello, world!","stream":false}' \
  --output tts.pcm

Examples (streaming)

curl \
  -X POST \
  'https://api.outspeed.com/v1/tts/' \
  -H 'Authorization: Bearer YOUR_OUTSPEED_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"model":"outspeed-tts-v2","voice":"clark","text":"Hello, world!","stream":true}' \
  --no-buffer \
  --output tts.pcm

Dialogue Generation

Generate audio with multiple voices (narrator + character) using the outspeed-tts-v2 model.
Try it visually at the Dialogue Playground

Endpoint

POST https://api.outspeed.com/v1/tts/dialogue

Request Body

{
  "model": "outspeed-tts-v2",
  "text": "*The old library stood silent.* I pushed open the heavy door. *Inside, dust particles danced in my flashlight beam.*",
  "speaker_voice": "9c5c73f4-1cb7-46cf-91d8-24c80b6288f0",
  "narrator_voice": "a42c84b2-0e6b-4c9f-a8e7-3f5d1c2e8a9b",
  "narrator_delimiter": "*"
}

Parameters

ParameterTypeRequiredDescription
modelstringYesMust be outspeed-tts-v2
textstringYesDialogue text with delimiters for narrator parts
speaker_voicestringYesVoice ID for character dialogue
narrator_voicestringNoVoice ID for narrator parts (omit to skip narrator)
narrator_delimiterstringNoDelimiter: *, (, [, or { (default: *)

Delimiter Usage

  • Text in delimiters = narrator voice
  • Text outside delimiters = character voice
Supported delimiters:
  • **narrator text*
  • ((narrator text)
  • [[narrator text]
  • {{narrator text}

Examples

curl -X POST https://api.outspeed.com/v1/tts/dialogue \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "outspeed-tts-v2",
    "text": "*The old library stood silent.* I pushed open the heavy door. *Inside, dust particles danced in my flashlight beam.*",
    "speaker_voice": "<voice-id-1>",
    "narrator_voice": "<voice-id-2>,
    "narrator_delimiter": "*"
  }' \
  --output dialogue.wav

Voice Selection

Both voices require voice IDs (not names). Get voice IDs from:

Response

Same as single voice TTS:
  • Content-Type: audio/wav
  • Headers: X-Sample-Rate, X-Channels, X-Bit-Depth
  • Body: Raw PCM audio bytes
I