Skip to content

Conversation

@XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Nov 26, 2025

Skipping unit tests for all features explicitly listed as unsupported on Windows
(e.g., NodeLatencyMonitor, Egress).

This change was initiated following the observation of flaky failures in monitor tool
Windows unit tests. Disabling all related tests improves CI stability and releases resources.

@XinShuYang XinShuYang force-pushed the winutskip branch 12 times, most recently from 68bf711 to 33335bf Compare November 26, 2025 12:55
@XinShuYang XinShuYang marked this pull request as ready for review November 26, 2025 13:38
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't explain the rationale for this change. Is it just to speed up Windows CI and potentially avoid flakes in Windows CI?

@@ -1,3 +1,6 @@
//go:build !windows
// +build !windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to include that variant anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, +build !windows is for older golang, since we are using golang 1.25 now, you don't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I will remove this variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add this complexity if the code compiles fine on Windows?
The feature is protected by a FeatureGate and I think we already have a mechanism to declare Agent features as unsupported on Windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context of this change is that we noticed a failure in this action: https://github.com/antrea-io/antrea/actions/runs/19687452111/job/56396073717?pr=7595, @XinShuYang found the feature doesn't support on Windows.

But you are right, it might be unnecessary to add the gobuild limitations on all other files considering they are already there for a long and running well. Shuyang, could you check and only handle the issue with monitor tool. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuyang, could you check and only handle the issue with monitor tool. Thanks.

@luolanzone OK, I can skip only this test in this PR, but I suspect the failure is not related to the Windows environment. We may still need to investigate further if we encounter the same failure in the Linux test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add this complexity if the code compiles fine on Windows? The feature is protected by a FeatureGate and I think we already have a mechanism to declare Agent features as unsupported on Windows?

This is to avoid the golangci-lint failure because some functions in monitor tool files are only used in skipped unit tests on Windows: https://github.com/antrea-io/antrea/actions/runs/19727571423/job/56521801920?pr=7596 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that there is a timing issue with the test that only happens in Windows CI because of issues specific to the Github hosted Windows runners. There seems to be "pauses" in the scheduling of the runner VM at time, with the clock remaining frozen for 100s of ms, then the VM resumes and the clock jumps forward to catch up. This creates issues for unit tests with poll-based assertions (or more generally for unit tests which depend on time), even when they use a pretty long timeout (1s / 2s).
I am hoping that we can get rid of these issues by switching to testing/synctest (but right now it doesn't work well with K8s informers and with the fake K8 client).

That being said, I am perfectly fine with disabling Windows unit tests for features which are not supported on Windows (as long as the feature is listed in unsupportedFeaturesOnWindows). I think that if we do it for one such feature, we should do it for all such features, so we should not only do it for NodeLatencyMonitor. This is a pretty good idea in the end, we release some CI resources and we can avoid flakes. I just want the reason clearly called out in the commit message / PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I am perfectly fine with disabling Windows unit tests for features which are not supported on Windows (as long as the feature is listed in unsupportedFeaturesOnWindows). I think that if we do it for one such feature, we should do it for all such features, so we should not only do it for NodeLatencyMonitor. This is a pretty good idea in the end, we release some CI resources and we can avoid flakes. I just want the reason clearly called out in the commit message / PR description.

Sure, I will update this PR to skip these unsupported Windows tests. Thanks a lot for the clarification.

@XinShuYang XinShuYang force-pushed the winutskip branch 4 times, most recently from 01ab607 to 93dd6a3 Compare November 27, 2025 07:16
@XinShuYang XinShuYang changed the title Skip unsupported unit tests on Windows Skip monitor tool unit tests on Windows Nov 27, 2025
@XinShuYang XinShuYang force-pushed the winutskip branch 2 times, most recently from 5539321 to 7f740c9 Compare December 2, 2025 15:01
@XinShuYang XinShuYang changed the title Skip monitor tool unit tests on Windows Skip Windows unit tests for unsupported features Dec 2, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need it in this file.
I think some features or CRDs like IPPools and ExternalIPPools are OS-agnostic, I suppose we don't have to skip Windows tests for them? @antoninbas any suggestion for these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep in simple, maybe we could avoid build tags altogether for pkfg/controller. This package is supposed to be specific to the Antrea Controller. If we don't support the Antrea Controller on Windows, maybe the simplest solution is to skip this package when we run unit tests on Windows by editing this make target appropriately:

antrea/Makefile

Lines 291 to 296 in 2cae339

.windows-test-unit: .coverage
@echo
@echo "==> Running unit tests <=="
CGO_ENABLED=1 $(GO) test $(TEST_ARGS) -race -coverpkg=antrea.io/antrea/cmd/...,antrea.io/antrea/pkg/... \
-coverprofile=.coverage/coverage-unit.txt -covermode=atomic \
antrea.io/antrea/cmd/... antrea.io/antrea/pkg/...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, yes, since no controller tests are related to Windows, we can skip them in the Makefile.

@XinShuYang XinShuYang force-pushed the winutskip branch 4 times, most recently from db697de to 53b76ba Compare December 10, 2025 13:52
Makefile Outdated
@echo
@echo "==> Running unit tests <=="
CGO_ENABLED=1 $(GO) test $(TEST_ARGS) -race -coverpkg=antrea.io/antrea/cmd/...,antrea.io/antrea/pkg/... \
@pkgs=$$($(GO) list antrea.io/antrea/cmd/... antrea.io/antrea/pkg/... | grep -v antrea.io/antrea/pkg/agent/controller | grep -v antrea.io/antrea/pkg/controller); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excluding antrea.io/antrea/pkg/agent/controller on Windows does not seem right to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, for pkg/agent/controller, we should only skip specific components for Windows tests.

Skipping unit tests for all features explicitly listed as unsupported on Windows
(e.g., NodeLatencyMonitor, Egress).

This change was initiated following the observation of flaky failures in monitor tool
Windows unit tests. Disabling all related tests improves CI stability and releases resources.

Signed-off-by: Shuyang Xin <[email protected]>
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @antoninbas do you want to take another look?

@antoninbas antoninbas merged commit 1248509 into antrea-io:main Dec 12, 2025
56 of 62 checks passed
kartikangiras pushed a commit to kartikangiras/antrea that referenced this pull request Dec 19, 2025
Skipping unit tests for all features explicitly listed as unsupported on Windows
(e.g., NodeLatencyMonitor, Egress).

This change was initiated following the observation of flaky failures in monitor tool
Windows unit tests. Disabling all related tests improves CI stability and releases resources.

Signed-off-by: Shuyang Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants