Skip to content

Gemini 3.1 Flash Live native-audio TTS can produce runaway/no-audio output and 1011 when explicit temperature: 0 is set #1578

@y-lobau

Description

@y-lobau

Thanks for the library. I'm reporting this here because the issue is triggered by a @google/genai Live API config value that the client accepts and forwards. The underlying failure may be a Gemini Live backend/product issue, but it would be helpful to know whether the JS SDK should validate, reject, clamp, or document this configuration for native-audio Live TTS.

Environment details

  • Programming language: TypeScript / JavaScript
  • OS: macOS 26.3.1, arm64
  • Language runtime version: Node.js v22.22.2
  • Package version: @google/genai ^1.47.0

Model / API path

  • Model: models/gemini-3.1-flash-live-preview
  • API: Gemini Live API via ai.live.connect
  • Output modality: audio only
  • Use case: native-audio TTS, text input sent to Live session
  • Voice tested: puck; also compared with AI Studio-style Zephyr
  • Language: Belarusian

Steps to reproduce

  1. Create a Live API session with models/gemini-3.1-flash-live-preview.
  2. Configure audio response modality and a normal speechConfig.
  3. Add explicit temperature: 0 to the Live config.
  4. Send Belarusian text to be spoken, either through sendRealtimeInput or AI Studio-style sendClientContent.
  5. Collect raw audio chunks before playback and observe session close metadata.
  6. Repeat a few times. The failure is intermittent but reproducible.

Minimal shape:

import { GoogleGenAI, Modality } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const session = await ai.live.connect({
  model: "models/gemini-3.1-flash-live-preview",
  config: {
    responseModalities: [Modality.AUDIO],
    temperature: 0,
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: {
          voiceName: "puck",
        },
      },
    },
    systemInstruction: {
      parts: [
        {
          text: "You are a robotic Text-To-Speech engine. Read the input text aloud exactly as provided. Language: Belarusian.",
        },
      ],
    },
  },
  callbacks: {
    onmessage(message) {
      // collect message.serverContent.modelTurn.parts[].inlineData.data
      // into raw PCM/WAV before playback
    },
    onerror(error) {
      console.error(error);
    },
    onclose(event) {
      console.log("closed", event.code, event.reason, event.wasClean);
    },
  },
});

session.sendClientContent({
  turns: [
    "Вядома, я ўсталю таймер на пяць хвілін і нагадаю вам, калі час скончыцца. Каб праверыць сінтэз маўлення, я прачытаю яшчэ некалькі беларускіх сказаў без зменаў і без адказаў на пытанні. Шчыра кажучы, гэта павінна гучаць натуральна: вучымся, груша, вугаль, ґанак, каўнер, рака, дзеці. Праігрываю гурт Naviband са spotify, але не перафармулёўваю гэты сказ і не замяняю словы сінонімамі. Калі ў тэксце ёсць пытанне, напрыклад: ці ўсё добра?, я проста агучваю пытанне як напісана.",
  ],
});

Expected behavior

The Live session should return bounded audio corresponding to the input text, then complete/close cleanly. If temperature: 0 is unsupported or unsafe for this model/output mode, the SDK or API should reject it or document the supported range.

Actual behavior

With explicit temperature: 0, the Live native-audio TTS response can become pathological before playback:

  • Audio output becomes oversized/runaway, far longer than the input text requires.
  • In some runs, the session closes with 1011 Resource exhausted.
  • In other baseline runs, there is no usable audio.
  • The problem is visible in raw PCM/WAV capture before any local playback, so this does not appear to be a speaker/playback issue.

Evidence

Using the same Belarusian prompt/text:

  • Product-style Live path with explicit temperature: 0:

    • 1/3 oversized failures.
    • Failing sample: 8,911,204 raw PCM bytes, about 185.65s at 24 kHz.
    • Close code: 1011 Resource exhausted.
  • Manual baseline with explicit temperature: 0:

    • 5 attempts: 3 OK, 1 no-audio, 1 oversized/error.
    • Oversized sample: 10,930,564 raw PCM bytes, about 227.7s at 24 kHz.
    • Close code: 1011 Resource exhausted.
  • Literal AI Studio-style control without explicit temperature:

    • 5/5 clean.
  • Temperature isolation using literal AI Studio-style session shape:

    • Without explicit temperature: 3/3 clean.
    • With temperature: 0: 1/3 oversized.
    • Failing sample: 12,616,804 raw PCM bytes, about 262.85s at 24 kHz.
    • Close code: 1011 Resource exhausted.
  • After removing explicit temperature from our production TTS path:

    • 20/20 product-path attempts clean.
    • 0 oversized/no-audio/1011 failures.

We have only reproduced and validated this with Belarusian input; we have not tested whether the same temperature: 0 behavior occurs with English or other languages.

Why I'm filing here

This may ultimately be a Gemini Live product/backend bug, but from the JS client side, it is unclear whether temperature: 0 is valid for gemini-3.1-flash-live-preview native-audio output.

Could you clarify whether:

  1. temperature: 0 is supported for Gemini Live native-audio TTS?
  2. Should the JS SDK validate or reject unsupported temperature ranges for this mode?
  3. The docs should mention a safe temperature range or recommend omitting temperature for native-audio TTS?

Metadata

Metadata

Assignees

Labels

api:gemini-apipriority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions