fix: resolve Git Anonymous Resolver excessive memory usage#8677
fix: resolve Git Anonymous Resolver excessive memory usage#8677tekton-robot merged 2 commits intotektoncd:mainfrom
Conversation
pkg/resolution/resolver/git/git.go
Outdated
| _, err = repo.execGit(ctx, "clone", repo.url, tmpDir, "--depth=1", "--no-checkout") | ||
| if err != nil { | ||
| return nil, cleanupFunc, err | ||
| } | ||
|
|
||
| _, err = repo.execGit(ctx, "fetch", "origin", repo.revision, "--depth=1") | ||
| if err != nil { | ||
| return nil, cleanupFunc, err | ||
| } | ||
|
|
||
| _, err = repo.execGit(ctx, "checkout", "FETCH_HEAD") | ||
| if err != nil { | ||
| return nil, cleanupFunc, err | ||
| } | ||
|
|
||
| revisionSha, err := repo.execGit(ctx, "rev-list", "-n1", "HEAD") | ||
| if err != nil { | ||
| return nil, cleanupFunc, err | ||
| } | ||
| repo.revision = strings.TrimSpace(string(revisionSha)) | ||
|
|
||
| return &repo, cleanupFunc, nil |
There was a problem hiding this comment.
I initially simply did a full git clone and then simply did git checkout. However doing a shallow clone with --no-checkout, fetching the revision, and then checking the revision out improved the time and space performance significantly. Using the same repository mentioned in the original issue as a benchmark, using a shallow clone reduced the clone time from ~110 seconds by a factor of ten. Using --no-checkout similarly reduced the disk space and time slightly (before the necessary revision was checked-out)
|
The following is the coverage report on the affected files.
|
| default: "_json_key" | ||
| - name: releaseAsLatest | ||
| description: Whether to tag and publish this release as Pipelines' latest | ||
| description: Whether to tag and publish this release as Pipelines latest |
There was a problem hiding this comment.
This was not a grammatical error, I just noticed it threw off the syntax highlighting of my editor and both versions seemed equally correct. I am happy to revert if needed.
| github.com/tektoncd/pipeline/cmd/nop: ghcr.io/tektoncd/pipeline/github.com/tektoncd/pipeline/combined-base-image:latest | ||
| github.com/tektoncd/pipeline/cmd/workingdirinit: ghcr.io/tektoncd/pipeline/github.com/tektoncd/pipeline/combined-base-image:latest | ||
|
|
||
| github.com/tektoncd/pipeline/cmd/git-init: cgr.dev/chainguard/git |
There was a problem hiding this comment.
The git-init package no longer exists.
|
/kind feat |
|
@aThorp96: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/kind feature |
|
Cc @vdemeester |
beta e2e tests are failing |
|
@waveywaves yeah, hoping to get to addressing the e2e test failures today or tomorrow. |
2f93442 to
c0e27c1
Compare
|
The following is the coverage report on the affected files.
|
c0e27c1 to
6073888
Compare
|
The following is the coverage report on the affected files.
|
pkg/resolution/resolver/git/git.go
Outdated
| type repository struct { | ||
| url string | ||
| username string | ||
| password string | ||
| directory string | ||
| revision string | ||
| } | ||
|
|
||
| func resolveRepository(ctx context.Context, url, username, password, revision string) (*repository, func(), error) { |
There was a problem hiding this comment.
Maybe we can export the repository struct and use it in the function as an argument instead of passing four arguments ?
There was a problem hiding this comment.
I can rework this a bit to make it more ergonomic. I am hesitant to export this struct but I think there is a way to work around that
There was a problem hiding this comment.
LMK what you think about the refactor
There was a problem hiding this comment.
Maybe something for a follow-up?
| volumeMounts: | ||
| - name: tmp-clone-volume | ||
| mountPath: "/tmp" |
There was a problem hiding this comment.
Mounting a directory at /tmp is necessary because the security context sets the root filesystem as read-only, and since we're using git clone we have to clone into the filesystem.
Additionally, using an empty-dir volume as the /tmp directory allows us to configure the size of the directory in alignment with the pod's memory requests.
| @@ -0,0 +1,2 @@ | |||
| baseImageOverrides: | |||
| github.com/tektoncd/pipeline/cmd/resolvers: cgr.dev/chainguard/git@sha256:566235a8ef752f37d285042ee05fc37dbb04293e50f116a231984080fb835693 | |||
There was a problem hiding this comment.
this would make all resolvers image having to use the git image... isnt that a bit much for example the http-resolver would def not need it...
There was a problem hiding this comment.
If I understand correctly all of the resolvers run as plugins together in the same pod(s). So at least while the some resolvers will have unnecessary access to the git binary, all of the resolver pods will need access to the binary if the git resolver is enabled. Not sure if that's that much better though
|
we used to have git binary used and one of the reason to use go-github is to make it easier to "ship" it since built-in now there is a issue with backward compatibility for downstream contributor to have that image updated and support (keep git binary updated) but then i still think it's a good idea to use the git binary directly, since the lib would never be as good as the git binary... |
| @@ -0,0 +1,2 @@ | |||
| baseImageOverrides: | |||
| github.com/tektoncd/pipeline/cmd/resolvers: cgr.dev/chainguard/git@sha256:566235a8ef752f37d285042ee05fc37dbb04293e50f116a231984080fb835693 | |||
There was a problem hiding this comment.
I think there might be some changes to be done in the tekton/publish.yaml task as well.
There was a problem hiding this comment.
In addition to modifying the task's inline .koconfig like is already done?
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
docs/git-resolver.md
Outdated
|
|
||
| Git clone with `git clone` is supported for anonymous and authenticated cloning. | ||
| Git clone with `git clone` is supported for anonymous and authenticated cloning | ||
| This mode shallow-clones the git repo before shallow-fetching and checking |
There was a problem hiding this comment.
that's a bit too low level git plumbing to understand?
There was a problem hiding this comment.
I removed the bit about shallow fetching, but left in the steps shallow-clone, fetch, then checkout. I added a note detailing when users may need to care about the git plumbing, since there is that possibility. Does that read better now?
|
|
||
| _, err = repo.execGit(ctx, "clone", repo.url, tmpDir, "--depth=1", "--no-checkout") | ||
| if err != nil { | ||
| if strings.Contains(err.Error(), "unable to get password from user") { |
There was a problem hiding this comment.
There was a problem hiding this comment.
I'd love to do something cleaner here. The exit code isn't specific to auth errors however.

My intention was to 1) maintain mostly similar error messaging to the old implementation 2) without introducing red herrings. I didn't see another way to ensure that auth issues (or private repos, and by proxy non-existent repos) would maintain their authentication required error without errors like unable to look up $domain being surpressed. Do you have any thoughts?
fbba524 to
89b7461
Compare
|
The following is the coverage report on the affected files.
|
Switch git resolver from go-git library to use git binary. The go-git library does not resolve deltas efficiently, and as a result is not recommended to be used to clone repositories with full-depth. In one example RemoteResolutionRequest targeting a repository which summed 145Mb, configuring the resolution timeout to 10 minutes and giving the resolver to have 25Gb of memory, the resolver pod was OOM killed after ~6 minutes. Additionally, since go-git's delta resolution does not accept any contexts, the time required and memory used during resolving a large repository will not be cutoff when the context is canceled, impacting the resolver's performance for other concurrent remote resolutions. Since the git resolver can be provided a revision which is not tracked at any ref in the repository, and because go-git only supports fetching fully-qualified refs, go-git does not support fetching arbitrary revisions. Therefore, in order to guarantee the requested revision is fetched, if we continue to use the go-git library we must fetch all revisions. Switching to the git binary enables the git resolver to take advantage of the git-fetch's support for fetching arbitrary revisions. Note that if the revision is not at any ref head, fetching the revision does depend on the git server enabling uploadpack.allowReachableSHA1InWant. Resolves tektoncd#8652 See also: https://git-scm.com/docs/protocol-capabilities#_allow_reachable_sha1_in_want NOTE: This feature is enabled and supported in Github and Gitlab but not Gitea: go-gitea/gitea#11958
89b7461 to
1b28bc8
Compare
|
The following is the coverage report on the affected files.
|
|
Thanks for this! |

Switch git resolver from go-git library to use git binary.
The go-git library does not resolve deltas efficiently, and as a result is not recommended to be used to clone repositories with full-depth. In one example RemoteResolutionRequest targeting a repository which summed 145Mb, configuring the resolution timeout to 10 minutes and giving the resolver to have 25Gb of memory, the resolver pod was OOM killed after ~6 minutes. Additionally, since go-git's delta resolution does not accept any contexts, the time required and memory used during resolving a large repository will not be cutoff when the context is canceled, impacting the resolver's performance for other concurrent remote resolutions.
Since the git resolver can be provided a commit sha which is not tracked at any ref/head, and because go-git only supports fetching fully-qualified refs, go-git does not support fetching arbitrary revisions. Therefore, in order to guarantee the requested revision is fetched, if we continue to use the go-git library, we must fetch all revisions.
Switching to the git binary significantly improves the time and memory performance of the git resolver (by multiple orders of magnitude, in my testing). Using the git binary also enables the resolver to take advantage of shallow fetching/cloning due to the git-fetch's support for fetching arbitrary revisions, improving both the time and memory performance by another order of magnitude. Note that if the revision is not at any ref head or tag, fetching the revision does depend on the git server enabling uploadpack.allowReachableSHA1InWant. This feature is enabled and supported in Github and Gitlab but not Gitea: go-gitea/gitea#11958
See also: https://git-scm.com/docs/protocol-capabilities#_allow_reachable_sha1_in_want
Resolves #8652
Changes
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes