Skip to content

xdsclient: do not process updates from closed server channels#8389

Merged
easwars merged 1 commit intogrpc:masterfrom
easwars:xdsclient_crash
Jul 29, 2025
Merged

xdsclient: do not process updates from closed server channels#8389
easwars merged 1 commit intogrpc:masterfrom
easwars:xdsclient_crash

Conversation

@easwars
Copy link
Contributor

@easwars easwars commented Jun 9, 2025

The xDS client is currently making an incorrect assumption that can lead to panics.

When an xDS client authority receives a response from a server, it checks to see if the response is from the currently active channel. If not, it assumes that it is from a higher priority server (because when a response is received from a server, all lower priority servers are closed), and therefore expects that a channel to that server exists. This need not be true because the responses are not handled inline, but are handled in the serializer. Here is a possible sequence:

  • Authority contains servers A, B, C (in priority order).
  • Let's say A and B are DOWN, and C is UP. Therefore the active channel is now C.
  • At time T0, the authority receives a message from server B. This is not handled inline, but is queued in the serializer for handling.
  • At time T1, the authority receives a message from server C. This is not handled inline, but is queued in the serializer for handling.
  • At time T2, the serializer handles the message from server B and as part of this handling, releases the reference to server C as it has now seen a response from a higher priority server. Also, the active channel now becomes B.
  • At time T3, the serializer handles the message from server C, and implicitly (wrongly) assumes that since it is not from the currently active channel B, it must be from a higher priority server and expects a channel to be present for that server.

This fix ensures what when the authority receives a response from a server that is not currently the active channel, it must ensure that it is from a higher priority server before proceeding with handling that response. If it is not from a higher priority server, it should drop the response on the floor and move on.

RELEASE NOTES:

  • xdsclient: Fixed a rare panic caused by processing a response from a closed server.

@easwars easwars requested a review from arjan-bal June 9, 2025 18:16
@easwars
Copy link
Contributor Author

easwars commented Jun 9, 2025

Fixes internal issue: https://b.corp.google.com/issues/408227990

@easwars easwars added this to the 1.74 Release milestone Jun 9, 2025
@easwars
Copy link
Contributor Author

easwars commented Jun 9, 2025

@purnesh42H : FYI since the migration PR got rolled back. If this goes in before that eventual roll-forward, we need to ensure that we don't lose track of this fix. Thanks.

@codecov
Copy link

codecov bot commented Jun 9, 2025

Codecov Report

❌ Patch coverage is 27.77778% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.39%. Comparing base (7238ab1) to head (7883084).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/clients/xdsclient/authority.go 27.77% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8389      +/-   ##
==========================================
- Coverage   82.48%   82.39%   -0.10%     
==========================================
  Files         413      413              
  Lines       40511    40518       +7     
==========================================
- Hits        33417    33384      -33     
- Misses       5737     5773      +36     
- Partials     1357     1361       +4     
Files with missing lines Coverage Δ
xds/internal/clients/xdsclient/authority.go 76.93% <27.77%> (-2.50%) ⬇️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@easwars easwars merged commit 8b61e8f into grpc:master Jul 29, 2025
15 checks passed
@easwars easwars deleted the xdsclient_crash branch July 29, 2025 23:26
dimpavloff pushed a commit to dimpavloff/grpc-go that referenced this pull request Aug 22, 2025
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants