Text-to-Speech (HTTP)

Overview

Generate audio from text using our TTS endpoints. We support both single-voice and multi-voice dialogue generation.

Single Voice TTS: Convert text to speech with one voice
Dialogue Generation: Mix narrator and character voices in the same audio file

Single Voice TTS

POST https://api.outspeed.com/v1/tts/

Request Body

{
  "model": "outspeed-tts-v2",
  "voice": "clark",
  "text": "Hello, world!",
  "stream": false
}

model: TTS model to use. Use outspeed-tts-v2 (outspeed-tts-v1 is deprecated)
voice: the voice identifier. Find all available voices and their models at TTS Playground
text: the text to synthesize
stream: set to true to stream audio chunks; false returns the full WAV

Response

Content-Type: audio/pcm
Headers:
- X-Sample-Rate: Sample rate (default: 24000)
- X-Channels: Number of audio channels (default: 1)
- X-Bit-Depth: Bit depth (default: 16)
Body: Raw PCM audio bytes (little-endian int16), 24kHz, mono. No WAV header is included. Wrap with a WAV header or convert with a tool like ffmpeg.

Authenticate with Authorization: Bearer <YOUR_OUTSPEED_API_KEY>.

Examples (non-streaming)

curl \
  -X POST \
  'https://api.outspeed.com/v1/tts/' \
  -H 'Authorization: Bearer YOUR_OUTSPEED_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"model":"outspeed-tts-v2","voice":"clark","text":"Hello, world!","stream":false}' \
  --output tts.pcm

Examples (streaming)

curl \
  -X POST \
  'https://api.outspeed.com/v1/tts/' \
  -H 'Authorization: Bearer YOUR_OUTSPEED_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"model":"outspeed-tts-v2","voice":"clark","text":"Hello, world!","stream":true}' \
  --no-buffer \
  --output tts.pcm

Dialogue Generation

Generate audio with multiple voices (narrator + character) using the outspeed-tts-v2 model.

Try it visually at the Dialogue Playground

Endpoint

POST https://api.outspeed.com/v1/tts/dialogue

Request Body

{
  "model": "outspeed-tts-v2",
  "text": "*The old library stood silent.* I pushed open the heavy door. *Inside, dust particles danced in my flashlight beam.*",
  "speaker_voice": "9c5c73f4-1cb7-46cf-91d8-24c80b6288f0",
  "narrator_voice": "a42c84b2-0e6b-4c9f-a8e7-3f5d1c2e8a9b",
  "narrator_delimiter": "*"
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Must be `outspeed-tts-v2`
`text`	string	Yes	Dialogue text with delimiters for narrator parts
`speaker_voice`	string	Yes	Voice ID for character dialogue
`narrator_voice`	string	No	Voice ID for narrator parts (omit to skip narrator)
`narrator_delimiter`	string	No	Delimiter: ``, `(`, `[`, or `{` (default: ``)

Delimiter Usage

Text in delimiters = narrator voice
Text outside delimiters = character voice

Supported delimiters:

* → *narrator text*
( → (narrator text)
[ → [narrator text]
{ → {narrator text}

Examples

curl -X POST https://api.outspeed.com/v1/tts/dialogue \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "outspeed-tts-v2",
    "text": "*The old library stood silent.* I pushed open the heavy door. *Inside, dust particles danced in my flashlight beam.*",
    "speaker_voice": "<voice-id-1>",
    "narrator_voice": "<voice-id-2>,
    "narrator_delimiter": "*"
  }' \
  --output dialogue.wav

Voice Selection

Both voices require voice IDs (not names). Get voice IDs from:

TTS Playground - Browse and copy existing voice IDs
Voice Upload - Upload custom voices or create voice clones

Response

Same as single voice TTS:

Content-Type: audio/wav
Headers: X-Sample-Rate, X-Channels, X-Bit-Depth
Body: Raw PCM audio bytes

API Reference

Compatibility

Overview

Single Voice TTS

Request Body

Response

Examples (non-streaming)

Examples (streaming)

Dialogue Generation

Endpoint

Request Body

Parameters

Delimiter Usage

Examples

Voice Selection

Response

API Reference

Compatibility

​Overview

​Single Voice TTS

​Request Body

​Response

​Examples (non-streaming)

​Examples (streaming)

​Dialogue Generation

​Endpoint

​Request Body

​Parameters

​Delimiter Usage

​Examples

​Voice Selection

​Response

Overview

Single Voice TTS

Request Body

Response

Examples (non-streaming)

Examples (streaming)

Dialogue Generation

Endpoint

Request Body

Parameters

Delimiter Usage

Examples

Voice Selection

Response