Bug report criteria
What happened?
In server/proxy/grpcproxy, when a watchBroadcast's underlying etcd watch connection is broken (the goroutine exits after for wr := range wch completes), the watchBroadcast remains in watchBroadcasts.bcasts with no indication that it is no longer receiving events.
This causes two problems:
-
add() can place new watchers into a dead broadcast: Since there is no stopped flag, add() happily accepts new watchers into a broadcast whose goroutine has already exited. These watchers will never receive any subsequent events.
-
coalesce() can migrate watchers into a dead broadcast: The coalesce logic only checks nextrev and responses, not whether the target broadcast is still alive. Watchers migrated to a dead broadcast will also stop receiving events.
Additionally, there is no mechanism to reassign existing watchers (orphans) from a dead broadcast to a healthy one.
What did you expect to happen?
- A disconnected
watchBroadcast should be marked as stopped and refuse new watchers.
coalesce() should not migrate watchers to a stopped broadcast.
- When a broadcast disconnects, its existing watchers should be automatically reassigned to a healthy broadcast or a newly created one.
How can we reproduce it (as minimally and precisely as possible)?
- Set up an etcd gRPC proxy with multiple client watchers coalesced on the same key range.
- Cause the backend etcd watch connection to break (e.g., network partition, etcd server restart).
- After the break, create a new watcher on the same key range through the proxy.
- Observe that the new watcher (and any existing watchers on the dead broadcast) never receives subsequent events.
Anything else we need to know?
No response
Etcd version (please run commands below)
Details
$ etcd --version
v3.5.4
$ etcdctl version
v3.5.4
Etcd configuration (command line flags or environment variables)
Details
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Details
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
Bug report criteria
What happened?
In
server/proxy/grpcproxy, when awatchBroadcast's underlying etcd watch connection is broken (the goroutine exits afterfor wr := range wchcompletes), thewatchBroadcastremains inwatchBroadcasts.bcastswith no indication that it is no longer receiving events.This causes two problems:
add()can place new watchers into a dead broadcast: Since there is nostoppedflag,add()happily accepts new watchers into a broadcast whose goroutine has already exited. These watchers will never receive any subsequent events.coalesce()can migrate watchers into a dead broadcast: The coalesce logic only checksnextrevandresponses, not whether the target broadcast is still alive. Watchers migrated to a dead broadcast will also stop receiving events.Additionally, there is no mechanism to reassign existing watchers (orphans) from a dead broadcast to a healthy one.
What did you expect to happen?
watchBroadcastshould be marked as stopped and refuse new watchers.coalesce()should not migrate watchers to a stopped broadcast.How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
No response
Etcd version (please run commands below)
Details
Etcd configuration (command line flags or environment variables)
Details
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Details
Relevant log output