Thanks for stopping by to let us know something could be better!
Environment details
- Programming language: Go
- OS: macOS 15 (darwin/arm64) — also reproduced on Linux (amd64)
- Language runtime version: go1.24
- Package version:
google.golang.org/genai v1.36.0 (the relevant code is unchanged on main as of this report)
Summary
When HTTPOptions.Timeout is set (any positive value — even one far larger than the actual response time), every streaming call (Models.GenerateContentStream) is aborted the moment the response headers arrive. Only the SSE chunks that happen to be buffered in the transport at that instant are delivered (typically just the first chunk); all later chunks are lost. Worse, the iterator then terminates without yielding an error, so callers receive a truncated response that looks like a complete, successful one.
Unary calls are unaffected. Calls without HTTPOptions.Timeout are unaffected.
Root cause
sendStreamRequest in api_client.go:
if timeout != nil && *timeout > 0*time.Second && isTimeoutBeforeDeadline(ctx, *timeout) {
requestContext, cancel = context.WithTimeout(ctx, *timeout)
defer cancel() // <-- bug
}
req = req.WithContext(requestContext)
resp, err := doRequest(ac, req)
...
// resp.Body will be closed by the iterator
return deserializeStreamResponse(resp, output)
http.Client.Do returns as soon as response headers are received; for a streaming call the body is read incrementally afterwards.
deserializeStreamResponse performs no reads — it only wraps resp.Body in a bufio.Scanner. The actual reads happen later, inside iterateResponseStream, when the caller ranges over the returned iter.Seq2.
- So
sendStreamRequest returns (and defer cancel() fires) before a single body byte has been consumed. Cancelling the request context aborts the in-flight HTTP request (RST_STREAM on h2). Bytes already buffered by the transport remain readable; everything else is gone, and the next Read returns context canceled. The configured timeout duration is irrelevant — it's the explicit cancel() call that kills the request, not timer expiry.
This is correct in the unary path (sendRequest), because there the body is fully read via deserializeUnaryResponse before the function returns. The streaming path appears to have inherited the defer cancel() without accounting for the body being consumed after return.
A second, compounding problem in iterateResponseStream:
if rs.r.Err() != nil {
...
log.Printf("Error %v", rs.r.Err())
}
The scanner's read error (context canceled here, but also any genuine mid-stream network error or timeout) is only logged via the stdlib log package and never yielded to the caller, so the truncation is indistinguishable from a normal, complete end of stream. This makes the bug very hard to attribute in production.
It also seems unlikely that existing tests can catch this: with a local/fast test server the whole response is buffered before the cancellation takes effect, so all chunks remain readable and the test passes. Real network latency between chunks is required to observe the truncation.
Steps to reproduce
- Set
HTTPOptions.Timeout to any positive value on the client config.
- Call
Models.GenerateContentStream against any endpoint whose SSE chunks arrive over time (a real model generating a long answer, or the fake server below).
- Observe that only the first chunk(s) are yielded and the iterator ends without an error.
Self-contained reproduction (fake SSE server, no API key needed):
package main
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"time"
"google.golang.org/genai"
)
func main() {
// Fake SSE endpoint: streams 5 chunks, one every 200ms.
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/event-stream")
f := w.(http.Flusher)
for i := 0; i < 5; i++ {
fmt.Fprintf(w, "data: {\"candidates\": [{\"content\": {\"role\": \"model\",\"parts\": [{\"text\": \"chunk %d \"}]}}]}\n\n", i)
f.Flush()
time.Sleep(200 * time.Millisecond)
}
}))
defer srv.Close()
timeout := 30 * time.Second // generous value; makes no difference
client, err := genai.NewClient(context.Background(), &genai.ClientConfig{
Backend: genai.BackendGeminiAPI,
APIKey: "test-key",
HTTPOptions: genai.HTTPOptions{
BaseURL: srv.URL,
Timeout: &timeout, // <-- comment this out and all 5 chunks arrive
},
})
if err != nil {
panic(err)
}
n := 0
for resp, err := range client.Models.GenerateContentStream(context.Background(), "gemini-2.0-flash", genai.Text("hi"), nil) {
if err != nil {
fmt.Println("iterator error:", err)
break
}
n++
fmt.Printf("received: %q\n", resp.Candidates[0].Content.Parts[0].Text)
}
fmt.Println("total chunks received:", n)
}
Output with Timeout set:
received: "chunk 0 "
2026/06/10 02:07:13 Error context canceled
total chunks received: 1
Output without Timeout (delete that one line):
received: "chunk 0 "
received: "chunk 1 "
received: "chunk 2 "
received: "chunk 3 "
received: "chunk 4 "
total chunks received: 5
Note the context canceled line goes to the stdlib logger only — the iterator itself reports success.
Expected behavior
HTTPOptions.Timeout bounds the streaming request without cancelling it prematurely: the deadline should cover the stream's lifetime, and the cancel should be released when the iterator finishes (e.g. pass cancel into responseStream and invoke it in iterateResponseStream's cleanup alongside rs.rc.Close()).
- If the body read does fail mid-stream (cancellation, timeout, network error),
iterateResponseStream should yield rs.r.Err() to the caller instead of only logging it, so consumers can distinguish a truncated stream from a complete one.
Actual behavior
Every streaming request with HTTPOptions.Timeout set is cancelled at header-arrival time; the response is truncated to whatever was already buffered, and the iterator ends as if the stream completed successfully.
Thanks for stopping by to let us know something could be better!
Environment details
google.golang.org/genai v1.36.0(the relevant code is unchanged onmainas of this report)Summary
When
HTTPOptions.Timeoutis set (any positive value — even one far larger than the actual response time), every streaming call (Models.GenerateContentStream) is aborted the moment the response headers arrive. Only the SSE chunks that happen to be buffered in the transport at that instant are delivered (typically just the first chunk); all later chunks are lost. Worse, the iterator then terminates without yielding an error, so callers receive a truncated response that looks like a complete, successful one.Unary calls are unaffected. Calls without
HTTPOptions.Timeoutare unaffected.Root cause
sendStreamRequestinapi_client.go:http.Client.Doreturns as soon as response headers are received; for a streaming call the body is read incrementally afterwards.deserializeStreamResponseperforms no reads — it only wrapsresp.Bodyin abufio.Scanner. The actual reads happen later, insideiterateResponseStream, when the caller ranges over the returnediter.Seq2.sendStreamRequestreturns (anddefer cancel()fires) before a single body byte has been consumed. Cancelling the request context aborts the in-flight HTTP request (RST_STREAM on h2). Bytes already buffered by the transport remain readable; everything else is gone, and the nextReadreturnscontext canceled. The configured timeout duration is irrelevant — it's the explicitcancel()call that kills the request, not timer expiry.This is correct in the unary path (
sendRequest), because there the body is fully read viadeserializeUnaryResponsebefore the function returns. The streaming path appears to have inherited thedefer cancel()without accounting for the body being consumed after return.A second, compounding problem in
iterateResponseStream:The scanner's read error (
context canceledhere, but also any genuine mid-stream network error or timeout) is only logged via the stdliblogpackage and never yielded to the caller, so the truncation is indistinguishable from a normal, complete end of stream. This makes the bug very hard to attribute in production.It also seems unlikely that existing tests can catch this: with a local/fast test server the whole response is buffered before the cancellation takes effect, so all chunks remain readable and the test passes. Real network latency between chunks is required to observe the truncation.
Steps to reproduce
HTTPOptions.Timeoutto any positive value on the client config.Models.GenerateContentStreamagainst any endpoint whose SSE chunks arrive over time (a real model generating a long answer, or the fake server below).Self-contained reproduction (fake SSE server, no API key needed):
Output with
Timeoutset:Output without
Timeout(delete that one line):Note the
context canceledline goes to the stdlib logger only — the iterator itself reports success.Expected behavior
HTTPOptions.Timeoutbounds the streaming request without cancelling it prematurely: the deadline should cover the stream's lifetime, and the cancel should be released when the iterator finishes (e.g. passcancelintoresponseStreamand invoke it initerateResponseStream's cleanup alongsiders.rc.Close()).iterateResponseStreamshould yieldrs.r.Err()to the caller instead of only logging it, so consumers can distinguish a truncated stream from a complete one.Actual behavior
Every streaming request with
HTTPOptions.Timeoutset is cancelled at header-arrival time; the response is truncated to whatever was already buffered, and the iterator ends as if the stream completed successfully.