This repository was archived by the owner on Jun 3, 2025. It is now read-only.
Improve retry behavior for push operation#1578
Merged
tejal29 merged 2 commits intoGoogleContainerTools:masterfrom Feb 23, 2021
SaschaSchwarze0:sascha-push-retry
Merged
Improve retry behavior for push operation#1578tejal29 merged 2 commits intoGoogleContainerTools:masterfrom SaschaSchwarze0:sascha-push-retry
tejal29 merged 2 commits intoGoogleContainerTools:masterfrom
SaschaSchwarze0:sascha-push-retry
Conversation
tejal29
approved these changes
Feb 23, 2021
no-reply
pushed a commit
to surfliner/surfliner-mirror
that referenced
this pull request
Mar 16, 2022
Setting the value to `3` initially, which we can adjust as needed. This feature was introduced relatively recently, the following PR GoogleContainerTools/kaniko#1578 Reason for feature: > We are facing intermittent issues to push the image to the destination. Cause is as far as I can tell network flakeness. There is a long-standing issue asking for retries for the push operation, so I investigated this. I recently noticed this error that happened to coincide with 5 dependency MRs from Renovate all coming in at once. This user references similar experiences with Kaniko/Gitlab: GoogleContainerTools/kaniko#584 (comment) The hope here using the `--push-retry` option is that it will try to push a few more times and avoid the need for us to manually retry the `build` job to get it to pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #584
Fixes #1290
Description
We are facing intermittent issues to push the image to the destination. Cause is as far as I can tell network flakeness. There is a long-standing issue asking for retries for the push operation, so I investigated this.
I am making two improvements in two commits.
Update go-containerregistry to 0.4
I am updating the go-containerregistry library to 0.4, mainly to pickup Retry registry access on some server errors. #901. This improves the logic in the library to retry on some 5xx HTTP status codes from the registry. With this change, I was making my firsts tests. I had a registry:2 instance running behind an Apache. By terminating the registry, I made the registry to return a 503 until it comes back after around 15 seconds. I was able to see the retries happening in the access log of the Apache.
Anyway, this was not yet good enough for three reasons:
The third item also caused my above test to not succeed because my registry was needing 15 seconds to restart and the library only retried 1+3=4 seconds.
Therefore the second extension:
Implement --push-retry argument
This introduces the
--push-retryargument which is handled with a simple retry logic inside Kaniko. I decided against filtering theerrorand basically retry everything. My thinking behind this is that Kaniko validates the registry credentials before the build (a great feature btw). If this succeeds, then the registry is in general functional. It does not make sense to later have special handling for (maybe non-retryable) things like DNS failures, or authentication problems.The default is
0which maintains the existing logic.Retries are happening with exponential delay (1s, 2s, 4s, 8s, 16s, ...).
I was repeating my test by specifying
--retry-count 5and the test was successful.I also improved the logging in push.go to make this transparent.
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
Reviewer Notes
Release Notes