Gemini Multimodal Live is Google’s model for realtime multimodal interactions. It supports:

  • Realtime video understanding
  • Live speech conversation
  • Multimodal streaming
  • Voice customization

This model will soon be available in Voice DevTools.

For more information, visit Google’s Gemini documentation.