Skip to content

Conversation

@hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Aug 14, 2025

For #6940

This commit brings AntreaProxy:

Dependencies:

@hongliangl hongliangl force-pushed the 20250802-update-3-party branch 10 times, most recently from 8302c67 to a83213c Compare August 20, 2025 07:23
@hongliangl hongliangl changed the title [WIP] Update code in third_party/proxy Evolve AntreaProxy with framework and feature updates Aug 20, 2025
@hongliangl hongliangl added the area/proxy Issues or PRs related to proxy functions in Antrea label Aug 20, 2025
@hongliangl hongliangl added this to the Antrea v2.5 release milestone Aug 20, 2025
@hongliangl hongliangl marked this pull request as ready for review August 20, 2025 07:24
@hongliangl hongliangl force-pushed the 20250802-update-3-party branch from a83213c to 1ba593f Compare August 20, 2025 09:59
@hongliangl hongliangl force-pushed the 20250802-update-3-party branch 2 times, most recently from 88eceb7 to b9dc7dd Compare October 23, 2025 08:35
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small initial comments, I will review #7380 more in details first

| Feature Name | Component | Default | Stage | Alpha Release | Beta Release | GA Release | Extra Requirements | Notes |
| ----------------------------- | ------------------ | ------- | ----- | ------------- | ------------ | ---------- | ------------------ | --------------------------------------------- |
| `AntreaProxy` | Agent | `true` | GA | v0.8 | v0.11 | v1.14 | Yes | Must be enabled for Windows. |
| `EndpointSlice` | Agent | `true` | GA | v0.13.0 | v1.11 | v1.14 | Yes | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that one thing that is worth pointing out is that even though we are keeping this FeatureGate around, setting it to False will have no effect any more (unlike the AntreaProxy FeatureGate for example). So I think the current warning is not sufficient for EndpointSlice. If the FeatureGate is explicitly set to False in the config, we should log an error stating that the configuration will be ignored.

Copy link
Contributor Author

@hongliangl hongliangl Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding this in NewProxyServer:

if !features.DefaultFeatureGate.Enabled(features.EndpointSlice) {
		klog.ErrorS(nil, "EndpointSlice cannot be disabled, ignoring feature gateway EndpointSlice setting to false")
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add "Cannot be disabled" in the Notes (last column of this table)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can use LockToDefault, we don't need this note anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still mention it, as otherwise there is no way for a user to tell simply by looking at the documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Locked to true (cannot be disabled). to column Notes.

@hongliangl hongliangl force-pushed the 20250802-update-3-party branch 4 times, most recently from 75a7bea to 32a4415 Compare October 31, 2025 15:42
@hongliangl hongliangl force-pushed the 20250802-update-3-party branch 2 times, most recently from 6e17422 to 069839c Compare November 6, 2025 07:34
@hongliangl hongliangl removed the request for review from jianjuns December 1, 2025 08:28
This commit brings AntreaProxy:

- Add `serviceProxyHealthy` field to Service health check response in AntreaProxy.
- Add healthz server serving on port 10256 to AntreaProxy, which is a replacement
  of kube-proxy health server serving on 10256.
- Add support of feature gate PreferSameTrafficDistribution in AntreaProxy. Refer
  to this https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3015-prefer-same-node.
- Remove Endpoints API support in AntreaProxy.
- Align the code `third_party/proxy` with K8s 1.34.2.

Signed-off-by: Hongliang Liu <[email protected]>
Comment on lines 25 to 28
// The upstream NodeManager starts its own informerFactory, which is undesirable for Antrea as we rely on a single
// shared informerFactory for all controllers. To avoid the overhead and duplication of running an additional
// informerFactory, Antrea provides a lightweight replacement NodeManager implementation instead.
type NodeManager struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have a couple of examples of standalone filtered informers:

serviceInformer := corev1informers.NewFilteredServiceInformer(
kubeClient,
namespace,
informerDefaultResync,
cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc},
func(listOptions *metav1.ListOptions) {
listOptions.FieldSelector = fields.OneTermEqualSelector("metadata.name", serviceName).String()
},
)
endpointSliceInformer := discoveryv1informers.NewFilteredEndpointSliceInformer(

If you only want to watch one Node, I think it's acceptable to use a dedicated filtered informer, instead of using the existing shared informer factory. So if this was the only issue, you could use the upstream NodeManager code. However, there seems to be a key difference, which is that the upstream code panics / exits on NodeIP change. I am not sure this is the behavior we want for Antrea, or at least it requires more discussion. That behavior doesn't seem to be customizable in the upstream code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have a couple of examples of standalone filtered informers:

Thanks for pointing out the existing standalone filtered informers. My bad, I didn't found that we have existing standalone filter informers. That's my only concern. If we have the existing usages, I'm happy to use that.

Additionally, after discussing with @tnqn yesterday, an alternative NodeManager might be a good choice.

  • The upstream NodeManager starts its own informerFactory, which is undesirable for Antrea as we should try to rely on a single shared informerFactory for all controllers.
  • At least in Antrea, only ProxyHealthServer consumes NodeManager. We may extend it in the future or replace it with upstream NodeManager one day.

I am not sure this is the behavior we want for Antrea, or at least it requires more discussion. That behavior doesn't seem to be customizable in the upstream code.

I saw that the corresponding functions are commented in upstream NodeManager for now. At least it should be safe for Antrea to introduce the upstream NodeManager.

// We exit whenever there is a change in NodeIPs detected initially, and NodeIPs received
	// on node watch event.
	if !reflect.DeepEqual(n.nodeIPs, nodeIPs) {
		klog.InfoS("NodeIPs changed for the node",
			"node", klog.KObj(node), "newNodeIPs", nodeIPs, "oldNodeIPs", n.nodeIPs)
		// FIXME: exit
		// klog.Flush()
		// n.exitFunc(1)
	}

In sum:

  • Upstream NodeManager: less code to maintain, but has its own informerFactory and non-configurable NodeIP-change semantics.
  • Antrea NodeManager: consistent with shared informerFactory and fully controllable, but requires local maintenance.

cc @antoninbas @tnqn

Copy link
Contributor Author

@hongliangl hongliangl Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas A small clarification: the upstream Kubernetes NodeManager creates its own dedicated SharedInformerFactory, as shown in L75–83 of the upstream code https://github.com/kubernetes/kubernetes/blob/ec1bf8a4f3a5f054065225dc8275c66b93310d17/pkg/proxy/node.go#L75.

It then builds a filtered Node informer from that factory, rather than using a shared factory.

Additionally, I checked all locations where SharedInformerFactory objects are used in Antrea. Currently, we only create informer factories per top-level component, such as:

./cmd/flow-aggregator/flow-aggregator.go:       informerFactory := informers.NewSharedInformerFactoryWithOptions(k8sClient, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimPod, k8s.TrimNode)))
./cmd/antrea-controller/controller.go:  informerFactory := informers.NewSharedInformerFactoryWithOptions(client, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimPod)))
./cmd/antrea-controller/controller.go:  crdInformerFactory := crdinformers.NewSharedInformerFactoryWithOptions(crdClient, informerDefaultResync, crdinformers.WithTransform(k8s.NewTrimmer()))
./cmd/antrea-agent/agent.go:    informerFactory := informers.NewSharedInformerFactoryWithOptions(k8sClient, informerDefaultResync, informers.WithTransform(k8s.NewTrimmer(k8s.TrimNode)))
./cmd/antrea-agent/agent.go:    crdInformerFactory := crdinformers.NewSharedInformerFactoryWithOptions(crdClient, informerDefaultResync, crdinformers.WithTransform(k8s.NewTrimmer()))
./cmd/antrea-agent/agent.go:            mcInformerFactoryWithNamespaceOption = mcinformers.NewSharedInformerFactoryWithOptions(mcClient,
./cmd/antrea-agent/agent.go:            mcInformerFactory = mcinformers.NewSharedInformerFactoryWithOptions(mcClient, informerDefaultResync, mcinformers.WithTransform(k8s.NewTrimmer()))

Modifies:
- Remove `defer close(bfr.run)` from `BoundedFrequencyRunner.Loop()`. AntreaProxy may still call `BoundedFrequencyRunner.Run()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the upstream code doesn't make sense to me... The reader goroutine should never be responsible for closing the channel...
Do you know why they don't have the same panic problem upstream? How is the architecture different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongliangl hongliangl force-pushed the 20250802-update-3-party branch from 95a0efd to 3055a78 Compare December 2, 2025 02:53
@hongliangl
Copy link
Contributor Author

I pushed another commit using the K8s upstream NodeManager.

@hongliangl hongliangl requested a review from antoninbas December 2, 2025 02:57
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits.
btw, I noticed there are race condition failures in the unit tests, please check and fix it, and there are also following error and warning, can you double check?

 I1202 02:59:53.842500   31953 shared_informer.go:648] "Warning: the specified resyncPeriod is invalid because this shared informer doesn't support resyncing" desired="1m0s"
I1202 02:59:53.842567   31953 shared_informer.go:648] "Warning: the specified resyncPeriod is invalid because this shared informer doesn't support resyncing" desired="1m0s"

E1202 02:59:53.944341   31953 proxier.go:1645] "Healthz server failed" err="failed to start proxy healthz on 0.0.0.0:10256: listen tcp 0.0.0.0:10256: bind: address already in use"

@hongliangl hongliangl force-pushed the 20250802-update-3-party branch from 3055a78 to 2cd15c3 Compare December 2, 2025 12:51
@hongliangl hongliangl requested a review from luolanzone December 2, 2025 12:52
@hongliangl hongliangl force-pushed the 20250802-update-3-party branch from 2cd15c3 to 0a3aa29 Compare December 2, 2025 13:00
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few nits, otherwise LGTM

| Feature Name | Component | Default | Stage | Alpha Release | Beta Release | GA Release | Extra Requirements | Notes |
| ----------------------------- | ------------------ | ------- | ----- | ------------- | ------------ | ---------- | ------------------ | --------------------------------------------- |
| `AntreaProxy` | Agent | `true` | GA | v0.8 | v0.11 | v1.14 | Yes | Must be enabled for Windows. |
| `EndpointSlice` | Agent | `true` | GA | v0.13.0 | v1.11 | v1.14 | Yes | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping for this

[3] _Antrea automatically adds the firewall rules to allow the WireGuard packets
(starting from v2.4), so the manual configuration on the host is not needed._

[4] _The default value is 10256, but it can be overridden in the antrea-agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also indicate that it can be completely disabled with antreaProxy.disableServiceHealthCheckServer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to

[4] _The default value is 10256, but it can be overridden in the antrea-agent
configuration `antreaProxy.serviceHealthCheckServerBindAddress`. It is used only
for external load balancer health checks. If `antreaProxy.disableServiceHealthCheckServer`
is set `true`, the health check server listening on the port will be disabled._

@hongliangl hongliangl force-pushed the 20250802-update-3-party branch 2 times, most recently from 0f4ace4 to c11e942 Compare December 3, 2025 02:32
@luolanzone
Copy link
Contributor

/test-all

luolanzone
luolanzone previously approved these changes Dec 3, 2025
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Hongliang Liu <[email protected]>
@hongliangl hongliangl force-pushed the 20250802-update-3-party branch from c37ec22 to fb0f547 Compare December 3, 2025 02:50
@hongliangl
Copy link
Contributor Author

/test-all

@hongliangl
Copy link
Contributor Author

@antoninbas @tnqn @luolanzone Thanks for reviewing this PR!

@luolanzone luolanzone merged commit 7c59949 into antrea-io:main Dec 3, 2025
61 checks passed
@hongliangl hongliangl deleted the 20250802-update-3-party branch December 3, 2025 06:02
petertran-avgo pushed a commit to petertran-avgo/antrea that referenced this pull request Dec 9, 2025
* Evolve AntreaProxy with framework and feature updates

This commit brings AntreaProxy:

- Add `serviceProxyHealthy` field to Service health check response in AntreaProxy.
- Add healthz server serving on port 10256 to AntreaProxy, which is a replacement
  of kube-proxy health server serving on 10256.
- Add support of feature gate PreferSameTrafficDistribution in AntreaProxy. Refer
  to this https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3015-prefer-same-node.
- Remove Endpoints API support in AntreaProxy.
- Align the code `third_party/proxy` with K8s 1.34.2.

Signed-off-by: Hongliang Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/proxy Issues or PRs related to proxy functions in Antrea

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants