-
Notifications
You must be signed in to change notification settings - Fork 433
Evolve AntreaProxy with framework and feature updates #7371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8302c67 to
a83213c
Compare
third_party/proxya83213c to
1ba593f
Compare
88eceb7 to
b9dc7dd
Compare
antoninbas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small initial comments, I will review #7380 more in details first
| | Feature Name | Component | Default | Stage | Alpha Release | Beta Release | GA Release | Extra Requirements | Notes | | ||
| | ----------------------------- | ------------------ | ------- | ----- | ------------- | ------------ | ---------- | ------------------ | --------------------------------------------- | | ||
| | `AntreaProxy` | Agent | `true` | GA | v0.8 | v0.11 | v1.14 | Yes | Must be enabled for Windows. | | ||
| | `EndpointSlice` | Agent | `true` | GA | v0.13.0 | v1.11 | v1.14 | Yes | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that one thing that is worth pointing out is that even though we are keeping this FeatureGate around, setting it to False will have no effect any more (unlike the AntreaProxy FeatureGate for example). So I think the current warning is not sufficient for EndpointSlice. If the FeatureGate is explicitly set to False in the config, we should log an error stating that the configuration will be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding this in NewProxyServer:
if !features.DefaultFeatureGate.Enabled(features.EndpointSlice) {
klog.ErrorS(nil, "EndpointSlice cannot be disabled, ignoring feature gateway EndpointSlice setting to false")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add "Cannot be disabled" in the Notes (last column of this table)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can use LockToDefault, we don't need this note anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still mention it, as otherwise there is no way for a user to tell simply by looking at the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Locked to true (cannot be disabled). to column Notes.
75a7bea to
32a4415
Compare
6e17422 to
069839c
Compare
This commit brings AntreaProxy: - Add `serviceProxyHealthy` field to Service health check response in AntreaProxy. - Add healthz server serving on port 10256 to AntreaProxy, which is a replacement of kube-proxy health server serving on 10256. - Add support of feature gate PreferSameTrafficDistribution in AntreaProxy. Refer to this https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3015-prefer-same-node. - Remove Endpoints API support in AntreaProxy. - Align the code `third_party/proxy` with K8s 1.34.2. Signed-off-by: Hongliang Liu <[email protected]>
434ec6b to
19e8fa9
Compare
pkg/agent/nodemanager/nodemanager.go
Outdated
| // The upstream NodeManager starts its own informerFactory, which is undesirable for Antrea as we rely on a single | ||
| // shared informerFactory for all controllers. To avoid the overhead and duplication of running an additional | ||
| // informerFactory, Antrea provides a lightweight replacement NodeManager implementation instead. | ||
| type NodeManager struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have a couple of examples of standalone filtered informers:
antrea/pkg/agent/client/endpoint_resolver.go
Lines 93 to 102 in c0f59a7
| serviceInformer := corev1informers.NewFilteredServiceInformer( | |
| kubeClient, | |
| namespace, | |
| informerDefaultResync, | |
| cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, | |
| func(listOptions *metav1.ListOptions) { | |
| listOptions.FieldSelector = fields.OneTermEqualSelector("metadata.name", serviceName).String() | |
| }, | |
| ) | |
| endpointSliceInformer := discoveryv1informers.NewFilteredEndpointSliceInformer( |
If you only want to watch one Node, I think it's acceptable to use a dedicated filtered informer, instead of using the existing shared informer factory. So if this was the only issue, you could use the upstream NodeManager code. However, there seems to be a key difference, which is that the upstream code panics / exits on NodeIP change. I am not sure this is the behavior we want for Antrea, or at least it requires more discussion. That behavior doesn't seem to be customizable in the upstream code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have a couple of examples of standalone filtered informers:
Thanks for pointing out the existing standalone filtered informers. My bad, I didn't found that we have existing standalone filter informers. That's my only concern. If we have the existing usages, I'm happy to use that.
Additionally, after discussing with @tnqn yesterday, an alternative NodeManager might be a good choice.
- The upstream NodeManager starts its own informerFactory, which is undesirable for Antrea as we should try to rely on a single shared informerFactory for all controllers.
- At least in Antrea, only
ProxyHealthServerconsumesNodeManager. We may extend it in the future or replace it with upstream NodeManager one day.
I am not sure this is the behavior we want for Antrea, or at least it requires more discussion. That behavior doesn't seem to be customizable in the upstream code.
I saw that the corresponding functions are commented in upstream NodeManager for now. At least it should be safe for Antrea to introduce the upstream NodeManager.
// We exit whenever there is a change in NodeIPs detected initially, and NodeIPs received
// on node watch event.
if !reflect.DeepEqual(n.nodeIPs, nodeIPs) {
klog.InfoS("NodeIPs changed for the node",
"node", klog.KObj(node), "newNodeIPs", nodeIPs, "oldNodeIPs", n.nodeIPs)
// FIXME: exit
// klog.Flush()
// n.exitFunc(1)
}In sum:
- Upstream NodeManager: less code to maintain, but has its own informerFactory and non-configurable NodeIP-change semantics.
- Antrea NodeManager: consistent with shared informerFactory and fully controllable, but requires local maintenance.
cc @antoninbas @tnqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas A small clarification: the upstream Kubernetes NodeManager creates its own dedicated SharedInformerFactory, as shown in L75–83 of the upstream code https://github.com/kubernetes/kubernetes/blob/ec1bf8a4f3a5f054065225dc8275c66b93310d17/pkg/proxy/node.go#L75.
It then builds a filtered Node informer from that factory, rather than using a shared factory.
Additionally, I checked all locations where SharedInformerFactory objects are used in Antrea. Currently, we only create informer factories per top-level component, such as:
./cmd/flow-aggregator/flow-aggregator.go: informerFactory := informers.NewSharedInformerFactoryWithOptions(k8sClient, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimPod, k8s.TrimNode)))
./cmd/antrea-controller/controller.go: informerFactory := informers.NewSharedInformerFactoryWithOptions(client, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimPod)))
./cmd/antrea-controller/controller.go: crdInformerFactory := crdinformers.NewSharedInformerFactoryWithOptions(crdClient, informerDefaultResync, crdinformers.WithTransform(k8s.NewTrimmer()))
./cmd/antrea-agent/agent.go: informerFactory := informers.NewSharedInformerFactoryWithOptions(k8sClient, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimNode)))
./cmd/antrea-agent/agent.go: crdInformerFactory := crdinformers.NewSharedInformerFactoryWithOptions(crdClient, informerDefaultResync, crdinformers.WithTransform(k8s.NewTrimmer()))
./cmd/antrea-agent/agent.go: mcInformerFactoryWithNamespaceOption = mcinformers.NewSharedInformerFactoryWithOptions(mcClient,
./cmd/antrea-agent/agent.go: mcInformerFactory = mcinformers.NewSharedInformerFactoryWithOptions(mcClient, informerDefaultResync, mcinformers.WithTransform(k8s.NewTrimmer()))
| Modifies: | ||
| - Remove `defer close(bfr.run)` from `BoundedFrequencyRunner.Loop()`. AntreaProxy may still call `BoundedFrequencyRunner.Run()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the upstream code doesn't make sense to me... The reader goroutine should never be responsible for closing the channel...
Do you know why they don't have the same panic problem upstream? How is the architecture different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dedicated reason is that they pass wait.NeverStop that will never exit to Loop: https://github.com/kubernetes/kubernetes/blob/9998041e0ffe0dd3f2abab3b9f95505c4402bf14/pkg/proxy/nftables/proxier.go#L760.
95a0efd to
3055a78
Compare
|
I pushed another commit using the K8s upstream NodeManager. |
luolanzone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two nits.
btw, I noticed there are race condition failures in the unit tests, please check and fix it, and there are also following error and warning, can you double check?
I1202 02:59:53.842500 31953 shared_informer.go:648] "Warning: the specified resyncPeriod is invalid because this shared informer doesn't support resyncing" desired="1m0s"
I1202 02:59:53.842567 31953 shared_informer.go:648] "Warning: the specified resyncPeriod is invalid because this shared informer doesn't support resyncing" desired="1m0s"
E1202 02:59:53.944341 31953 proxier.go:1645] "Healthz server failed" err="failed to start proxy healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use"
3055a78 to
2cd15c3
Compare
2cd15c3 to
0a3aa29
Compare
antoninbas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few nits, otherwise LGTM
| | Feature Name | Component | Default | Stage | Alpha Release | Beta Release | GA Release | Extra Requirements | Notes | | ||
| | ----------------------------- | ------------------ | ------- | ----- | ------------- | ------------ | ---------- | ------------------ | --------------------------------------------- | | ||
| | `AntreaProxy` | Agent | `true` | GA | v0.8 | v0.11 | v1.14 | Yes | Must be enabled for Windows. | | ||
| | `EndpointSlice` | Agent | `true` | GA | v0.13.0 | v1.11 | v1.14 | Yes | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping for this
| [3] _Antrea automatically adds the firewall rules to allow the WireGuard packets | ||
| (starting from v2.4), so the manual configuration on the host is not needed._ | ||
|
|
||
| [4] _The default value is 10256, but it can be overridden in the antrea-agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also indicate that it can be completely disabled with antreaProxy.disableServiceHealthCheckServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to
[4] _The default value is 10256, but it can be overridden in the antrea-agent
configuration `antreaProxy.serviceHealthCheckServerBindAddress`. It is used only
for external load balancer health checks. If `antreaProxy.disableServiceHealthCheckServer`
is set `true`, the health check server listening on the port will be disabled._
0f4ace4 to
c11e942
Compare
|
/test-all |
luolanzone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c11e942 to
c37ec22
Compare
Signed-off-by: Hongliang Liu <[email protected]>
c37ec22 to
fb0f547
Compare
|
/test-all |
|
@antoninbas @tnqn @luolanzone Thanks for reviewing this PR! |
* Evolve AntreaProxy with framework and feature updates This commit brings AntreaProxy: - Add `serviceProxyHealthy` field to Service health check response in AntreaProxy. - Add healthz server serving on port 10256 to AntreaProxy, which is a replacement of kube-proxy health server serving on 10256. - Add support of feature gate PreferSameTrafficDistribution in AntreaProxy. Refer to this https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3015-prefer-same-node. - Remove Endpoints API support in AntreaProxy. - Align the code `third_party/proxy` with K8s 1.34.2. Signed-off-by: Hongliang Liu <[email protected]>
For #6940
This commit brings AntreaProxy:
serviceProxyHealthyfield to Service health check response in AntreaProxy.of kube-proxy health server serving on 10256.
to this https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3015-prefer-same-node.
third_party/proxywith K8s 1.34.2.Dependencies: