Skip to content

Conversation

@vdemeester
Copy link
Member

@vdemeester vdemeester commented Dec 3, 2025

Changes

  • Stop immediate requeue loops when default-timeout-minutes is "0"
  • Remove redundant hasCondition checks (Knative already deduplicates)

This adds an e2e tests that looks at pipeline logs to see how much reconciler loop there is. If you run it before the fix, it would count more than 1500 reconciler loop, whereas with the fix, only about 10.

/cc @afrittoli @pritidesai @tektoncd/core-maintainers

It took me a while to figure out, and I got some help from Claude (AI) to write the tests. The previous behavior seemed very weird as well, with no timeout, we would, re-queue instantly, which is.. madness 🙃

/kind bug

Fixes #8495

This could be a good candidate to be backported.

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Fix an issue where there was excessive reconciliation in case of no timeout on TaskRun or PipelineRun.

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Dec 3, 2025
@tekton-robot tekton-robot requested review from a team and pritidesai December 3, 2025 10:23
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 3, 2025
@vdemeester vdemeester added this to the v1.8.0 milestone Dec 3, 2025
@vdemeester vdemeester force-pushed the fix-8495-excessive-reconciliation branch from c99df33 to 70e919d Compare December 3, 2025 11:57
@vdemeester
Copy link
Member Author

/hold
Getting help from Claude to figure out the e2e failures 😅 Will remove the hold once I get greens

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2025
Copy link
Member

@twoGiants twoGiants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job for the fix 😸 👍. My brain got challenged understanding the timeout order... 🧠 ⚡. It looks good though! I would just keep the code a bit more consistent, will aid in future understanding.

See my comments below.

- Stop immediate requeue loops when default-timeout-minutes is "0"
- Remove redundant hasCondition checks (Knative already deduplicates)

This adds an e2e tests that looks at pipeline logs to see how much
reconciler loop there is. If you run it before the fix, it would count
more than 1500 reconciler loop, whereas with the fix, only about 10.

Signed-off-by: Vincent Demeester <[email protected]>
Co-Authored-By: Claude <[email protected]>
@vdemeester vdemeester force-pushed the fix-8495-excessive-reconciliation branch from 661ed4e to 9d13377 Compare December 9, 2025 10:13
@vdemeester
Copy link
Member Author

@twoGiants I changed a bit the if/else chains to make it.. less weird.. hopefully it's more readable.

@vdemeester
Copy link
Member Author

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2025
@vdemeester
Copy link
Member Author

/retest

Copy link
Member

@twoGiants twoGiants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 😸 👍

I would just add a category the tests => parallel and return early in a if.

/approve
/lgtm

if timeout != config.NoTimeoutDuration {
waitTime := timeout - elapsed
if finallyWaitTime < waitTime {
waitTime = finallyWaitTime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
waitTime = finallyWaitTime
return controller.NewRequeueAfter(finallyWaitTime)

// With the fix, reconciliations should stay well below 20 (typically around 10 or less).
//
// This test validates the fix by counting actual reconciliations from controller logs.
func TestPipelineRunExcessiveReconciliation(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Categorize the test as a parallel test following the new categorization logic.

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 23, 2025
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: twoGiants

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 23, 2025
@tekton-robot tekton-robot merged commit 84d32b5 into tektoncd:main Dec 23, 2025
38 of 39 checks passed
@vdemeester vdemeester deleted the fix-8495-excessive-reconciliation branch December 23, 2025 10:45
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Dec 23, 2025
Use early returns in timeout handling for better code consistency
and readability throughout the reconciliation logic.

Add missing @test:execution=parallel annotation to the excessive
reconciliation e2e test for proper test categorization.

Addresses review feedback from PR tektoncd#9202.
tekton-robot pushed a commit that referenced this pull request Dec 23, 2025
Use early returns in timeout handling for better code consistency
and readability throughout the reconciliation logic.

Add missing @test:execution=parallel annotation to the excessive
reconciliation e2e test for proper test categorization.

Addresses review feedback from PR #9202.
khrm pushed a commit to khrm/pipeline that referenced this pull request Jan 12, 2026
Use early returns in timeout handling for better code consistency
and readability throughout the reconciliation logic.

Add missing @test:execution=parallel annotation to the excessive
reconciliation e2e test for proper test categorization.

Addresses review feedback from PR tektoncd#9202.
khrm pushed a commit to khrm/pipeline that referenced this pull request Jan 12, 2026
Use early returns in timeout handling for better code consistency
and readability throughout the reconciliation logic.

Add missing @test:execution=parallel annotation to the excessive
reconciliation e2e test for proper test categorization.

Addresses review feedback from PR tektoncd#9202.
@vdemeester vdemeester modified the milestones: v1.8.0, v1.9.0 (LTS) Jan 26, 2026
@vdemeester
Copy link
Member Author

/cherry-pick release-v1.6.x release-v1.3.x release-v1.0.x

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.3.x successful!

A new pull request has been created to cherry-pick this change to release-v1.3.x.

Please review and merge the cherry-pick PR.

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.6.x successful!

A new pull request has been created to cherry-pick this change to release-v1.6.x.

Please review and merge the cherry-pick PR.

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.0.x successful!

A new pull request has been created to cherry-pick this change to release-v1.0.x.

Please review and merge the cherry-pick PR.

vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Jan 28, 2026
Bump Go version to 1.24.0 to enable t.Context() usage in tests,
which was introduced in Go 1.24.

This is required for the cherry-pick of tektoncd#9202 (excessive reconciliation
fix) to work on the release-v1.0.x branch.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
tekton-robot pushed a commit that referenced this pull request Jan 30, 2026
Bump Go version to 1.24.0 to enable t.Context() usage in tests,
which was introduced in Go 1.24.

This is required for the cherry-pick of #9202 (excessive reconciliation
fix) to work on the release-v1.0.x branch.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@vdemeester
Copy link
Member Author

/cherry-pick release-v1.0.x

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.0.x failed!

The automatic cherry-pick to release-v1.0.x failed.

Output:

🤖 Starting cherry-pick process...
Fetching PR #9202 information...
Found merge commit: 84d32b509c85cd08567edf3e3bd86d42a20dfb40
PR title: fix: Prevent excessive reconciliation when timeout disabled
Fetching target branch: release-v1.0.x...
From https://github.com/tektoncd/pipeline
 * branch                release-v1.0.x -> FETCH_HEAD
Checking for existing cherry-pick PR...
Creating cherry-pick branch: cherry-pick-9202-to-release-v1.0.x...
Switched to a new branch 'cherry-pick-9202-to-release-v1.0.x'
branch 'cherry-pick-9202-to-release-v1.0.x' set up to track 'origin/release-v1.0.x'.
Fetching commits from PR #9202...
Found 1 commit(s) to cherry-pick
Cherry-picking commit 1/1: 9d13377dd086adc318f481b02494eefaf0694892...
fatal: bad object 9d13377dd086adc318f481b02494eefaf0694892
❌ ERROR: Cherry-pick failed for commit 9d13377dd086adc318f481b02494eefaf0694892 due to conflicts or other errors

Next steps:

  • Check the action logs for complete details
  • If the PR is not merged, merge it first and try again
  • If there are conflicts, you'll need to manually cherry-pick this PR

@waveywaves
Copy link
Member

/cherry-pick release-v1.3.x

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.3.x failed!

The automatic cherry-pick to release-v1.3.x failed.

Output:

🤖 Starting cherry-pick process...
Fetching PR #9202 information...
Found merge commit: 84d32b509c85cd08567edf3e3bd86d42a20dfb40
PR title: fix: Prevent excessive reconciliation when timeout disabled
Fetching target branch: release-v1.3.x...
From https://github.com/tektoncd/pipeline
 * branch                release-v1.3.x -> FETCH_HEAD
Checking for existing cherry-pick PR...
Creating cherry-pick branch: cherry-pick-9202-to-release-v1.3.x...
Switched to a new branch 'cherry-pick-9202-to-release-v1.3.x'
branch 'cherry-pick-9202-to-release-v1.3.x' set up to track 'origin/release-v1.3.x'.
Fetching commits from PR #9202...
Found 1 commit(s) to cherry-pick
Cherry-picking commit 1/1: 9d13377dd086adc318f481b02494eefaf0694892...
fatal: bad object 9d13377dd086adc318f481b02494eefaf0694892
❌ ERROR: Cherry-pick failed for commit 9d13377dd086adc318f481b02494eefaf0694892 due to conflicts or other errors

Next steps:

  • Check the action logs for complete details
  • If the PR is not merged, merge it first and try again
  • If there are conflicts, you'll need to manually cherry-pick this PR

@waveywaves
Copy link
Member

/cherry-pick release-v1.3.x

@tekton-robot
Copy link
Collaborator

Cherry-pick to release-v1.3.x failed!

The automatic cherry-pick to release-v1.3.x failed.

Output:

🤖 Starting cherry-pick process...
Fetching PR #9202 information...
Found merge commit: 84d32b509c85cd08567edf3e3bd86d42a20dfb40
PR title: fix: Prevent excessive reconciliation when timeout disabled
Fetching target branch: release-v1.3.x...
From https://github.com/tektoncd/pipeline
 * branch                release-v1.3.x -> FETCH_HEAD
Checking for existing cherry-pick PR...
Creating cherry-pick branch: cherry-pick-9202-to-release-v1.3.x...
Switched to a new branch 'cherry-pick-9202-to-release-v1.3.x'
branch 'cherry-pick-9202-to-release-v1.3.x' set up to track 'origin/release-v1.3.x'.
Fetching commits from PR #9202...
Found 1 commit(s) to cherry-pick
Cherry-picking commit 1/1: 9d13377dd086adc318f481b02494eefaf0694892...
fatal: bad object 9d13377dd086adc318f481b02494eefaf0694892
❌ ERROR: Cherry-pick failed for commit 9d13377dd086adc318f481b02494eefaf0694892 due to conflicts or other errors

Next steps:

  • Check the action logs for complete details
  • If the PR is not merged, merge it first and try again
  • If there are conflicts, you'll need to manually cherry-pick this PR

@vdemeester
Copy link
Member Author

arf, we probably need to do it manually ? 🤔 or something is fishy...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

PipelineRun is reconciled thousands of times

4 participants