Live Sports Commentator
A realtime commentator that provides live commentary on sports events.
Make sure you have gone over QuickStart before trying this example.
In this example, we will build a realtime commentator that provides commentary on a live poker tournament. We accomplish by using screenshare to capture the live game on Youtube and microphone audio to interact with the commentator.
Demo
Creating a Sports Commentator in 4 steps
This app will process input from your microphone, video from screenshare, send it to an LLM and convert the reponse back to voice.
- The full github repository is available here.
This application will initially support only English as the source language.
Start by cloning the repository
git clone https://github.com/outspeed-ai/outspeed.git
cd examples/sports_commentator/
Install Dependencies
Run Backend
Ensure you’re in the same directory as voice_bot.py!
You will need the following environment variables:
DEEPGRAM_API_KEY
- You can get this by going to https://console.deepgram.com/ and clicking on the API key tab.GROQ_API_KEY
- You can get this by going to https://console.groq.com/keys and clicking on the API key tab.CARTESIA_API_KEY
- You can get this by going to https://play.cartesia.ai/keys and clicking on the API key tab.
All of these providers have a free tier. Once you have your keys, run the following command:
export DEEPGRAM_API_KEY=<your_deepgram_api_key>
export GROQ_API_KEY=<your_groq_api_key>
export CARTESIA_API_KEY=<your_cartesia_api_key>
Finally, run the following command to start the server:
python poker_commentator.py
The console will output the URL you can use to connect to the server (default is http://localhost:8080).
Run Demo
We have already created a simple frontend using our React SDK. You can then browse to the following page with your browser:
https://playground.outspeed.com/webrtc
- Paste the link your received from the previous step into the URL field.
- Select Audio device. Leave Video device blank. Click Run to begin.
Understanding the Process
Review the code in poker_commentator.py
.
Setup
The PokerCommentator
class initializes with the setup
method, which is automatically called when the application starts. This method is responsible for setting up the necessary services and loading models. Here’s a breakdown of the services initialized:
- DeepgramSTT: Converts spoken audio to text using Deepgram’s speech-to-text API, configured with a sample rate of 8000 Hz.
- KeyFrameDetector: Analyzes video streams to detect significant moments, using a sensitivity threshold of 0.2 and a maximum time interval of 15 seconds between key frames.
- GeminiVision: Processes audio and video inputs to generate insightful poker commentary, guided by a detailed system prompt. It operates with a response temperature of 0.9 and maintains a chat history for context.
- TokenAggregator: Aggregates tokens from GeminiVision to form coherent responses.
- ElevenLabsTTS: Converts text responses back into spoken audio using Eleven Labs’ text-to-speech API, optimized for low latency and using a specific voice model.
- AudioConverter: Converts audio formats to ensure compatibility across different services.
Streaming Endpoint
The run
method in the PokerCommentator
class is marked as a streaming endpoint, handling real-time audio and video streams. Here’s how it processes these streams:
- Audio Processing: The audio input stream is first converted to text using the
DeepgramSTT
service. - Video Processing: Simultaneously, the video input stream is analyzed by the
KeyFrameDetector
to identify key moments. - Commentary Generation: The text from Deepgram and the video analysis from KeyFrameDetector are then processed by
GeminiVision
to generate commentary. - Token Aggregation: The commentary tokens generated are refined by the
TokenAggregator
for coherence. - Text-to-Speech: The coherent text is then converted into spoken audio by
ElevenLabsTTS
. - Audio Conversion: Finally, the audio stream is formatted by the
AudioConverter
for output.
The method outputs three streams: the audio stream of the commentary, the chat history text stream, and the cloned video input stream.
Support
For any assistance or questions, feel free to join our Discord community. We’re excited to see what you build!
Was this page helpful?