Today, we’re launching Gemini 3.1 Flash Live via the Gemini Live API in Google AI Studio. Gemini 3.1 Flash Live helps enable developers to build real-time voice and vision agents that can not only process the world around them, but also respond at the speed of conversation.
This is a step change in latency, reliability and more natural-sounding dialogue, delivering the quality needed for the next generation of voice-first AI.
Experience enhanced latency, reliability and quality
For real-time interactions, every millisecond of latency strips away the natural flow of the conversation that users expect. The new model better understands tone, emphasis and intent, enabling agents with key improvements:
- Higher task completion rates in noisy, real-world environments: We’ve significantly improved the model’s ability to trigger external tools and deliver information during live conversations. By better discerning relevant speech from environmental sounds like traffic or television, the model more effectively filters out background noise to remain reliable and responsive to instructions.
- Better instruction-following: Adherence to complex system instructions has been boosted significantly. Your agent will stay within its operational guardrails, even when conversations take unexpected turns.
- More natural and low-latency dialogue: The latest model improves on latency and is even more effective at recognizing acoustic nuances like pitch and pace compared to 2.5 Flash Native Audio, making real-time conversations feel a lot more fluid and natural.
- Multi-lingual capabilities: The model supports more than 90 languages for real-time multi-modal conversations.
See the Gemini Live API in action
Developers are actively building voice agents that communicate with a natural flow and pace and take actions reliably with Gemini Flash Live models. Here are a few examples of real-world apps that use the model to power their conversational interactions:

