Skip to content

leakcheck: Fix flaky test TestCheck #8309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 13, 2025
Merged

Conversation

Pranjali-2501
Copy link
Contributor

@Pranjali-2501 Pranjali-2501 commented May 9, 2025

Fixes #8276

leakcheck.TestCheck and leakcheck.TestCheckRegisterIgnore is flaky due to a gouroutine leak caused by context.WithTimeout().
Flakiness was filed in #8276.

RELEASE NOTES: N/A

@Pranjali-2501 Pranjali-2501 added this to the 1.73 Release milestone May 9, 2025
@Pranjali-2501 Pranjali-2501 requested review from dfawley and arjan-bal May 9, 2025 18:17
@Pranjali-2501 Pranjali-2501 self-assigned this May 9, 2025
Copy link

linux-foundation-easycla bot commented May 9, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

codecov bot commented May 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.19%. Comparing base (7fb5738) to head (baf4db3).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8309      +/-   ##
==========================================
+ Coverage   82.15%   82.19%   +0.04%     
==========================================
  Files         419      419              
  Lines       41904    41990      +86     
==========================================
+ Hits        34426    34515      +89     
+ Misses       6013     6008       -5     
- Partials     1465     1467       +2     

see 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
e := &testLogger{}
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
CheckGoroutines(ctx, e)
if e.errorCount != leakCount {
if CheckGoroutines(ctx, e); e.errorCount == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, or you can do

Suggested change
if CheckGoroutines(ctx, e); e.errorCount == 0 {
if CheckGoroutines(ctx, e); ctx.Err() == nil {

Or even:

Suggested change
if CheckGoroutines(ctx, e); e.errorCount == 0 {
if CheckGoroutines(ctx, e); e.errorCount < 3 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check expects that the goroutines spawned by time.Sleep() should still be alive.
I have not put e.errorcount == 3, because ctx can cause its goroutine live longer and it will cause a errorcount of 4 (Because of this the test was flaky)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not put e.errorcount == 3, because ctx can cause its goroutine live longer and it will cause a errorcount of 4

Right, which is why I suggested changing to failing if it's less than three. We should have at least three. Or we can just confirm the check failed by checking ctx.Err() and then not have to worry about all the complexity to track errorCount at all. Either way is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, which is why I suggested changing to failing if it's less than three. We should have at least three. Or we can just confirm the check failed by checking ctx.Err() and then not have to worry about all the complexity to track errorCount at all. Either way is fine.

Right, make sense. Both works fine. I'm going with e.errorcount < 3for now.

Comment on lines 26 to 28

// "sync"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@Pranjali-2501 Pranjali-2501 requested a review from dfawley May 12, 2025 03:51
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change the PR first comment to include "Fixes #". This will ensure the PR and issue are linked and close the issue when the PR is merged.

}
e := &testLogger{}
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
CheckGoroutines(ctx, e)
if e.errorCount != leakCount {
if CheckGoroutines(ctx, e); e.errorCount == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not put e.errorcount == 3, because ctx can cause its goroutine live longer and it will cause a errorcount of 4

Right, which is why I suggested changing to failing if it's less than three. We should have at least three. Or we can just confirm the check failed by checking ctx.Err() and then not have to worry about all the complexity to track errorCount at all. Either way is fine.

@@ -47,16 +47,14 @@ func TestCheck(t *testing.T) {
for i := 0; i < leakCount; i++ {
go func() { time.Sleep(2 * time.Second) }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to not use Sleep? We should always avoid sleeps for correctness.

Instead do:

ch := make(chan struct{})
for i := 0; i < leakCount; i++ {
	go func() { <-ch }()
}

// test interestingGoroutines() and CheckGoroutines()

close(ch) // make goroutines exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@Pranjali-2501 Pranjali-2501 requested a review from dfawley May 12, 2025 17:51
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after a few minor nits are fixed. Thanks!

if e.errorCount != leakCount {
t.Errorf("CheckGoroutines found %v leaks, want %v leaks", e.errorCount, leakCount)
if CheckGoroutines(ctx, e); e.errorCount < leakCount {
t.Errorf("CheckGoroutines() = %v, want count %v", e.errorCount, leakCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"want count < %d" to more accurately describe the assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for e.errorcount >= leakcount, not less than leakcount.

Want e.errorcount >= leakcount because ctx can cause its spawned goroutine to leak, which causes flakiness in the test.

@Pranjali-2501 Pranjali-2501 requested a review from arjan-bal May 13, 2025 07:44
@arjan-bal arjan-bal changed the title Modify Flaky test: leakcheck.TestCheck leakcheck: Fix flaky test TestCheck May 13, 2025
@arjan-bal arjan-bal merged commit b89909b into grpc:master May 13, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: leakcheck.TestCheck
3 participants