Skip to content
This repository was archived by the owner on Mar 9, 2022. It is now read-only.

Conversation

@Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented May 24, 2017

Based on #41.

This PR:

  1. Update contained to containerd/containerd@193abed which contains some content api update and implements Status function in content api.
  2. When pulling image, get references of all resources associated to the image, wait for those pulling to be done by checking content status. This makes sure that concurrent pull which pulls the same resource won't return before the resource is fully pulled.

/cc @yujuhong @mikebrow

@Random-Liu Random-Liu changed the title Make Wait and check image pulling progress. May 24, 2017
@Random-Liu
Copy link
Member Author

Random-Liu commented May 24, 2017

I tested the PR with crictl, and it works fine.

On the client side:

# ./crictl --runtime-endpoint=/var/run/cri-containerd.sock image pull gcr.io/google_containers/node-problem-detector:v0.3.0 &
[1] 34986
# ./crictl --runtime-endpoint=/var/run/cri-containerd.sock image pull gcr.io/google_containers/node-problem-detector:v0.3.0 &
[2] 35001
# ./crictl --runtime-endpoint=/var/run/cri-containerd.sock image pull gcr.io/google_containers/node-problem-detector:v0.3.0 &
[3] 35026
# ./crictl --runtime-endpoint=/var/run/cri-containerd.sock image pull gcr.io/google_containers/node-problem-detector:v0.3.0 &
[4] 35049
root@lantaol0:/usr/local/google/home/lantaol/workspace/bin# sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef
sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef
sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef
sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef

On the server side:

I0523 17:17:09.229538   34930 image_pull.go:83] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" with auth config nil
I0523 17:17:09.522979   34930 image_pull.go:83] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" with auth config nil
I0523 17:17:09.734073   34930 image_pull.go:83] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" with auth config nil
I0523 17:17:10.018493   34930 image_pull.go:221] Start downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:10.022578   34930 image_pull.go:221] Start downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:10.025246   34930 image_pull.go:221] Start downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:10.473200   34930 image_pull.go:243] Dispatch returns error: rpc error: code = 2 desc = failed reading status of resume write: stat /var/lib/containerd/content/ingest/8a4f180296d3db0939850d4ca9eeb459961c780a80ef7422276d82311ea99619/data: no such file or directory
I0523 17:17:10.516208   34930 image_pull.go:243] Dispatch returns error: failed commit on ref "layer-sha256:fe2c0211783ce83f6a77c6ec26246a4c92c40d715dde750d909a5a832efae381": content: not found
I0523 17:17:10.663316   34930 image_pull.go:83] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" with auth config nil
I0523 17:17:10.674156   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 31500/75542288
I0523 17:17:10.718166   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 441089/75542288
I0523 17:17:10.874179   34930 image_pull.go:303] Pulling resource "layer-sha256:e7f6d00961c7a054ec7ffeca9cde1e60a3daf8e09d9ba9587b31e7758d4e1387" with progress 1751805/12379583
I0523 17:17:10.874210   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 8388608/75542288
I0523 17:17:10.899750   34930 image_pull.go:221] Start downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:10.917034   34930 image_pull.go:303] Pulling resource "layer-sha256:e7f6d00961c7a054ec7ffeca9cde1e60a3daf8e09d9ba9587b31e7758d4e1387" with progress 4897533/12379583
I0523 17:17:10.917074   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 8388608/75542288
I0523 17:17:11.073871   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 16515072/75542288
I0523 17:17:11.117034   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 21757952/75542288
I0523 17:17:11.274025   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 41680896/75542288
I0523 17:17:11.316781   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 47840480/75542288
I0523 17:17:11.473875   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 65698913/75542288
I0523 17:17:11.516782   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 68680801/75542288
I0523 17:17:11.673920   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 0/75542288
I0523 17:17:11.716876   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 0/75542288
I0523 17:17:11.768966   34930 image_pull.go:243] Dispatch returns error: failed commit on ref "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608": rpc error: code = 2 desc = "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" failed size validation: 0 != 75542288
I0523 17:17:11.873894   34930 image_pull.go:295] Statuses [{Ref:layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608 Offset:12316672 Total:75542288 Expected: StartedAt:2017-05-24 00:17:11.867026136 +0000 UTC UpdatedAt:2017-05-24 00:17:11.867026136 +0000 UTC}]
I0523 17:17:11.873958   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 12316672/75542288
I0523 17:17:11.916851   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 16777216/75542288
I0523 17:17:11.969480   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 20971520/75542288
I0523 17:17:12.073851   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 31457280/75542288
I0523 17:17:12.116825   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 36241408/75542288
I0523 17:17:12.169592   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 42991616/75542288
I0523 17:17:12.274101   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 53477376/75542288
I0523 17:17:12.316905   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 57671680/75542288
I0523 17:17:12.369617   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 62914560/75542288
I0523 17:17:12.473949   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 73400320/75542288
I0523 17:17:12.517142   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 75542288/75542288
I0523 17:17:12.569682   34930 image_pull.go:303] Pulling resource "layer-sha256:1b39978eabd9889bb48d1fd7af03252dae1b521b5c87b2585351c12016823608" with progress 75542288/75542288
I0523 17:17:12.673914   34930 image_pull.go:247] Finish downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:12.716819   34930 image_pull.go:247] Finish downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:12.769682   34930 image_pull.go:247] Finish downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:12.857947   34930 image_pull.go:247] Finish downloading resources for image "gcr.io/google_containers/node-problem-detector:v0.3.0"
I0523 17:17:22.887328   34930 image_pull.go:110] Pulled image "gcr.io/google_containers/node-problem-detector:v0.3.0" with image id "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef", digest "sha256:6616143975194a20514c0581624769e8c9f0dd2dcdc039ffa9b0a68e02e2205d"
I0523 17:17:22.890878   34930 image_pull.go:87] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" returns image reference "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef"
I0523 17:17:25.951426   34930 image_pull.go:110] Pulled image "gcr.io/google_containers/node-problem-detector:v0.3.0" with image id "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef", digest "sha256:6616143975194a20514c0581624769e8c9f0dd2dcdc039ffa9b0a68e02e2205d"
I0523 17:17:25.951487   34930 image_pull.go:87] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" returns image reference "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef"
I0523 17:17:26.807163   34930 image_pull.go:110] Pulled image "gcr.io/google_containers/node-problem-detector:v0.3.0" with image id "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef", digest "sha256:6616143975194a20514c0581624769e8c9f0dd2dcdc039ffa9b0a68e02e2205d"
I0523 17:17:26.807326   34930 image_pull.go:87] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" returns image reference "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef"
I0523 17:17:27.935938   34930 image_pull.go:110] Pulled image "gcr.io/google_containers/node-problem-detector:v0.3.0" with image id "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef", digest "sha256:6616143975194a20514c0581624769e8c9f0dd2dcdc039ffa9b0a68e02e2205d"
I0523 17:17:27.936167   34930 image_pull.go:87] PullImage "gcr.io/google_containers/node-problem-detector:v0.3.0" returns image reference "sha256:64704e7d2079594fa45b39cbe927e16fcad0c2eef345ee1217e4882cbf6adcef"

@Random-Liu Random-Liu mentioned this pull request May 24, 2017
4 tasks
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if err != nil {
return "", "", fmt.Errorf("failed to fetch image %q desc %+v: %v", ref, desc, err)
// Dispatch returns error when requested resources are locked.
// In that case, we should start waiting and checking the pulling
Copy link
Member

@mikebrow mikebrow May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so they are not always sync calls... :-) This explains some of the errors I was seeing... that the concurrent request may not be finished yet thus the locked resource!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of wish they would block themselves.. but ok this works.

Copy link
Member Author

@Random-Liu Random-Liu May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikebrow so they are not always sync calls... :-) This explains some of the errors I was seeing... that the concurrent request may not be finished yet thus the locked resource!

The first call will be a sync call. The problem is that the second call returns immediately with error because someone else holds the lock of the resource. :)

Copy link
Member

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a question

}

// waitDownloadingInterval is the interval to check resource downloading progress.
const waitDownloadingInterval = 200 * time.Millisecond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: waitDownloadingPollInterval might be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

if err := c.waitForResourcesDownloading(ctx, resources.all()); err != nil {
return "", "", fmt.Errorf("failed to wait for image %q downloading: %v", ref, err)
}
glog.V(4).Infof("Finish downloading resources for image %q", ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Finish/Finished

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Unpack the image layers into snapshots.
if _, err = c.rootfsUnpacker.Unpack(ctx, manifest.Layers); err != nil {
rootfsUnpacker := rootfsservice.NewUnpackerFromClient(c.rootfsService)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using a new unpacker?

Copy link
Member Author

@Random-Liu Random-Liu May 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in following PRs, we'll start to use rootfsService more, instead of only the Unpacker.

So I change the rootfsService to be a field of criContainerdService, and new a unpacker when we need it. The unpacker is only needed in this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@yujuhong
Copy link
Member

The rest is just minor nits. Feel free to apply the lgtm label and merge the PR after you update it.

@Random-Liu
Copy link
Member Author

Random-Liu commented May 26, 2017

@yujuhong @mikebrow Thanks for reviewing!

@Random-Liu Random-Liu force-pushed the wait-image-pulling branch from d076706 to c3ac5f7 Compare May 27, 2017 00:12
@Random-Liu
Copy link
Member Author

Apply LGTM based on #46 (review) and #46 (review).

@Random-Liu Random-Liu merged commit a49f66e into containerd:master May 27, 2017
@Random-Liu Random-Liu deleted the wait-image-pulling branch May 27, 2017 00:24
lanchongyizu pushed a commit to lanchongyizu/cri-containerd that referenced this pull request Sep 3, 2017
adelina-t pushed a commit to adelina-t/cri that referenced this pull request Sep 26, 2019
Don't panic when processing nil stats
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants