Skip to content

GenerateContentStream: setting HTTPOptions.Timeout cancels the in-flight stream as soon as headers arrive — response is silently truncated after the first chunk(s) #816

@crabfishxy

Description

@crabfishxy

Thanks for stopping by to let us know something could be better!

Environment details

  • Programming language: Go
  • OS: macOS 15 (darwin/arm64) — also reproduced on Linux (amd64)
  • Language runtime version: go1.24
  • Package version: google.golang.org/genai v1.36.0 (the relevant code is unchanged on main as of this report)

Summary

When HTTPOptions.Timeout is set (any positive value — even one far larger than the actual response time), every streaming call (Models.GenerateContentStream) is aborted the moment the response headers arrive. Only the SSE chunks that happen to be buffered in the transport at that instant are delivered (typically just the first chunk); all later chunks are lost. Worse, the iterator then terminates without yielding an error, so callers receive a truncated response that looks like a complete, successful one.

Unary calls are unaffected. Calls without HTTPOptions.Timeout are unaffected.

Root cause

sendStreamRequest in api_client.go:

if timeout != nil && *timeout > 0*time.Second && isTimeoutBeforeDeadline(ctx, *timeout) {
    requestContext, cancel = context.WithTimeout(ctx, *timeout)
    defer cancel()                                   // <-- bug
}
req = req.WithContext(requestContext)

resp, err := doRequest(ac, req)
...
// resp.Body will be closed by the iterator
return deserializeStreamResponse(resp, output)
  1. http.Client.Do returns as soon as response headers are received; for a streaming call the body is read incrementally afterwards.
  2. deserializeStreamResponse performs no reads — it only wraps resp.Body in a bufio.Scanner. The actual reads happen later, inside iterateResponseStream, when the caller ranges over the returned iter.Seq2.
  3. So sendStreamRequest returns (and defer cancel() fires) before a single body byte has been consumed. Cancelling the request context aborts the in-flight HTTP request (RST_STREAM on h2). Bytes already buffered by the transport remain readable; everything else is gone, and the next Read returns context canceled. The configured timeout duration is irrelevant — it's the explicit cancel() call that kills the request, not timer expiry.

This is correct in the unary path (sendRequest), because there the body is fully read via deserializeUnaryResponse before the function returns. The streaming path appears to have inherited the defer cancel() without accounting for the body being consumed after return.

A second, compounding problem in iterateResponseStream:

if rs.r.Err() != nil {
    ...
    log.Printf("Error %v", rs.r.Err())
}

The scanner's read error (context canceled here, but also any genuine mid-stream network error or timeout) is only logged via the stdlib log package and never yielded to the caller, so the truncation is indistinguishable from a normal, complete end of stream. This makes the bug very hard to attribute in production.

It also seems unlikely that existing tests can catch this: with a local/fast test server the whole response is buffered before the cancellation takes effect, so all chunks remain readable and the test passes. Real network latency between chunks is required to observe the truncation.

Steps to reproduce

  1. Set HTTPOptions.Timeout to any positive value on the client config.
  2. Call Models.GenerateContentStream against any endpoint whose SSE chunks arrive over time (a real model generating a long answer, or the fake server below).
  3. Observe that only the first chunk(s) are yielded and the iterator ends without an error.

Self-contained reproduction (fake SSE server, no API key needed):

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
	"time"

	"google.golang.org/genai"
)

func main() {
	// Fake SSE endpoint: streams 5 chunks, one every 200ms.
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Content-Type", "text/event-stream")
		f := w.(http.Flusher)
		for i := 0; i < 5; i++ {
			fmt.Fprintf(w, "data: {\"candidates\": [{\"content\": {\"role\": \"model\",\"parts\": [{\"text\": \"chunk %d \"}]}}]}\n\n", i)
			f.Flush()
			time.Sleep(200 * time.Millisecond)
		}
	}))
	defer srv.Close()

	timeout := 30 * time.Second // generous value; makes no difference
	client, err := genai.NewClient(context.Background(), &genai.ClientConfig{
		Backend: genai.BackendGeminiAPI,
		APIKey:  "test-key",
		HTTPOptions: genai.HTTPOptions{
			BaseURL: srv.URL,
			Timeout: &timeout, // <-- comment this out and all 5 chunks arrive
		},
	})
	if err != nil {
		panic(err)
	}

	n := 0
	for resp, err := range client.Models.GenerateContentStream(context.Background(), "gemini-2.0-flash", genai.Text("hi"), nil) {
		if err != nil {
			fmt.Println("iterator error:", err)
			break
		}
		n++
		fmt.Printf("received: %q\n", resp.Candidates[0].Content.Parts[0].Text)
	}
	fmt.Println("total chunks received:", n)
}

Output with Timeout set:

received: "chunk 0 "
2026/06/10 02:07:13 Error context canceled
total chunks received: 1

Output without Timeout (delete that one line):

received: "chunk 0 "
received: "chunk 1 "
received: "chunk 2 "
received: "chunk 3 "
received: "chunk 4 "
total chunks received: 5

Note the context canceled line goes to the stdlib logger only — the iterator itself reports success.

Expected behavior

  • HTTPOptions.Timeout bounds the streaming request without cancelling it prematurely: the deadline should cover the stream's lifetime, and the cancel should be released when the iterator finishes (e.g. pass cancel into responseStream and invoke it in iterateResponseStream's cleanup alongside rs.rc.Close()).
  • If the body read does fail mid-stream (cancellation, timeout, network error), iterateResponseStream should yield rs.r.Err() to the caller instead of only logging it, so consumers can distinguish a truncated stream from a complete one.

Actual behavior

Every streaming request with HTTPOptions.Timeout set is cancelled at header-arrival time; the response is truncated to whatever was already buffered, and the iterator ends as if the stream completed successfully.

Metadata

Metadata

Labels

api:gemini-apiIssues related to Gemini APIpriority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions