Skip to content

exponential backoff in taskRun controller#8926

Merged
tekton-robot merged 1 commit intotektoncd:mainfrom
pritidesai:backoff-pod-creation
Jul 30, 2025
Merged

exponential backoff in taskRun controller#8926
tekton-robot merged 1 commit intotektoncd:mainfrom
pritidesai:backoff-pod-creation

Conversation

@pritidesai
Copy link
Member

Changes

Add exponential backoff retry logic to the TaskRun controller for pod creation, providing better resilience against transient webhook timeout errors while maintaining immediate failure for permanent errors.

Reference: #8902 (comment)

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Introduced **exponential backoff retry** mechanism for `createPod` function to improve robustness against transient webhook issues in a heavy cluster during resource creation.

Add exponential backoff retry logic to the TaskRun controller for pod creation,
providing better resilience against transient webhook timeout errors while
maintaining immediate failure for permanent errors.

Signed-off-by: Priti Desai <[email protected]>
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 29, 2025
@tekton-robot tekton-robot requested a review from dibyom July 29, 2025 06:14
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 29, 2025
@tekton-robot tekton-robot requested a review from jerop July 29, 2025 06:14
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/controller/errors.go Do not exist 100.0%
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 91.9% 0.1
pkg/reconciler/taskrun/taskrun.go 85.9% 86.3% 0.4

@pritidesai pritidesai added this to the v1.3.0 (LTS) milestone Jul 29, 2025
@pritidesai
Copy link
Member Author

@afrittoli @vdemeester please take a look, thanks!

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pritidesai!
/approve

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2025

// IsWebhookTimeout checks if the error is due to a mutating admission webhook timeout.
// This function is used to determine if an error should trigger exponential backoff retry logic.
func IsWebhookTimeout(err error) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectCalls int
}

testCases := []testCase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2025
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, vdemeester, waveywaves

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [afrittoli,vdemeester,waveywaves]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot merged commit ae74639 into tektoncd:main Jul 30, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants