-
Notifications
You must be signed in to change notification settings - Fork 531
Description
The LinearCache will clear the watches under the name
of changed resources in notifyAll
calls (L147):
go-control-plane/pkg/cache/v3/linear.go
Lines 140 to 148 in 996a28b
func (cache *LinearCache) notifyAll(modified map[string]struct{}) { | |
// de-duplicate watches that need to be responded | |
notifyList := make(map[chan Response][]string) | |
for name := range modified { | |
for watch := range cache.watches[name] { | |
notifyList[watch] = append(notifyList[watch], name) | |
} | |
delete(cache.watches, name) | |
} |
However, just cleaning the the watches under the name
is not enough. It needs to clean all the watches to the same chan Response
. Because the Sotw v3 server creates the chan Response
with only 1 buffer (L369):
go-control-plane/pkg/server/sotw/v3/server.go
Lines 362 to 371 in 7e211bd
// cancel existing watches to (re-)request a newer version | |
switch { | |
case req.TypeUrl == resource.EndpointType: | |
if values.endpointNonce == "" || values.endpointNonce == nonce { | |
if values.endpointCancel != nil { | |
values.endpointCancel() | |
} | |
values.endpoints = make(chan cache.Response, 1) | |
values.endpointCancel = s.cache.CreateWatch(req, streamState, values.endpoints) | |
} |
Consider the following sequence:
- The sotw server receives a
DiscoveryRequest
with 2 resource names, and calls thecache.CreateWatch
to the LinearCache. - The LinearCache registers the
chan Response
provided by the sotw server with 2 watch entries corresponding to the requested resources. - The LinearCache's
UpdateResource
is called with the first resource name. - The LinearCache's
UpdateResource
is called with the second resource name and thechan Response
is blocked. - The sotw server receives another
DiscoveryRequest
but the LinearCache is still locked therefore they are deadlocked.
The LinearCache could just maintain another chan Response -> resource name
map for fast cleaning in notifyAll
call. But I think the root cause is that the sotw server uses a single goroutine to handle both streams of bi-di grpc.
If it handled them in separated goroutines, there would be no such deadlock, and there might be even no need to recreate a new chan Response
on each DiscoveryRequest
and unregister watches on each notifyAll
call in LinearCache
.