Overview

The input_language field in your session configuration provides a hint about the primary language users will speak. However, the model can understand multiple languages simultaneously, regardless of the hint provided. The AI agent will understand their speech and respond in the language corresponding to the chosen voice.
The response language depends on your chosen voice. See available voices for language support. The input_language setting is a hint to optimize recognition, but users can speak in any supported language.

Supported Languages

Currently supported input language hints:
  • en - English (default)
  • zh - Chinese (Mandarin)
  • hi - Hindi

Configuration

Add input_language to your session configuration as a hint for the primary expected language:
const sessionConfig = {
  model: "outspeed-v1",
  instructions: "You are a helpful assistant.",
  voice: "sophie",
  input_language: "zh", // Hint: Primary language is Chinese, but users can still switch to English mid-conversation
  turn_detection: {
    type: "semantic_vad",
  },
  first_message: "Hello! How can I help you today?",
};

Language Examples

English (Default)

const sessionConfig = {
  // rest of config...
  input_language: "en", // or omit this field entirely
};

// User speaks: "What's the weather like?"
// Agent responds: "The weather is sunny and 72°F." (English voice responds in English)

Chinese (Mandarin) with Multilingual Support

const sessionConfig = {
  // rest of config...
  input_language: "zh", // Hint: Primary language is Chinese, but users can still switch to English mid-conversation
};

// User speaks: "今天天气怎么样?" (How's the weather today?)
// Agent responds: "The weather is sunny and 72°F." (English voice responds in English)

// User can also mix languages:
// User speaks: "今天 how are you?"
// Agent responds: "Hello! I'm doing well, thank you for asking." (English voice responds in English)

Hindi with Multilingual Support

const sessionConfig = {
  // rest of config...
  voice: "apoorva", // Hindi voice
  input_language: "hi", // Hint for Hindi, but users can mix languages
};

// User speaks: "आज मौसम कैसा है?" (How's the weather today?)
// Agent responds: "मौसम धूप है और 72°F है।" (Hindi voice responds in Hindi)

// User can also mix languages:
// User speaks: "हेलो, सब बढ़िया? Everything is fine?"
// Agent responds: "हैलो! हाँ, सब कुछ ठीक है। मैं आपकी कैसे मदद कर सकती हूँ?" (Hindi voice responds in Hindi)

Language Detection

The system automatically detects which languages are spoken in each user input and includes this information in the transcription event:
{
  "event_id": "bf03fb6b-1160-411d-b332-f2a97b743d2c",
  "type": "conversation.item.input_audio_transcription.completed",
  "server_sent": true,
  "item_id": "item_5c49066c55c549d29132a467d01e8e3b",
  "transcript": "हेलो, सब बढ़िया? Everything is fine?",
  "languages": ["hi", "en"],
  "content_index": 0
}
The languages array contains all detected languages in the user’s input, allowing you to understand the linguistic composition of each utterance. You can listen to this event to understand the languages spoken in the user’s input.
conversation.on("conversation.item.input_audio_transcription.completed", (event) => {
  console.log("Languages spoken:", event.languages);
});

Important Notes

  • Hint, not restriction: input_language is a hint to optimize recognition, not a limitation
  • Multilingual support: Users can speak multiple languages in a single session or even single utterance
  • Voice-dependent output: The AI agent responds in the language of the chosen voice (see available voices)
  • Default behavior: Default input language value is en for English
  • Language detection: Each transcription includes detected languages in the languages array