-
Notifications
You must be signed in to change notification settings - Fork 347
Fix update container resources #319
Fix update container resources #319
Conversation
Signed-off-by: Lantao Liu <[email protected]>
Signed-off-by: Lantao Liu <[email protected]>
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
| // Client holds the containerd container client. | ||
| // containerd.Container is a pointer underlying. New assignment won't affect | ||
| // the previous pointer, so simply lock around is enough. | ||
| type Client struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
| id := cntr.ID | ||
| // Do not update the container when there is a removal in progress. | ||
| if status.Removing { | ||
| return fmt.Errorf("container %q is in removing state", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an ignore or error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could either ignore it or return error here. The reason I let it return error is that:
- The container is being removed, it only adds more race condition to update the resources at the same time;
- Docker has similar behavior https://github.com/moby/moby/blob/master/daemon/update.go#L52
| // Update container spec. If the container is not started yet, updating | ||
| // spec makes sure that the resource limits are correct when start; | ||
| // if the container is already started, updating spec is still required, | ||
| // the spec will become our source of truth for resource limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
| return fmt.Errorf("failed to marshal spec %+v: %v", newSpec, err) | ||
| } | ||
| info.Spec = any | ||
| // TODO(random-liu): Add helper function in containerd to do the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add this today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| defer func() { | ||
| if retErr != nil { | ||
| // Reset spec on error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, what is the reason/rule here for resetting to the spec on error? Eg. failed to load container.. reset to spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that: If the container is running, we updated the spec, but failed to update the real resource limit of the running task, there will be inconsistency between the spec and the running task if we don't recover the spec.
This is best effort anyway. cri-containerd dies before we recover the spec or before we update the task, there will still be inconsistency. We could reconcile it during restart, but I don't think it's super necessary.
| if err != nil { | ||
| if errdefs.IsNotFound(err) { | ||
| // Task exited already. | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good!
| // Task exited already. | ||
| return nil | ||
| } | ||
| return fmt.Errorf("failed to get task: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random thought here.. the term "task" may be very confusing to kubernetes users... Let's consider using "container" or "container state" instead of task in error messages we generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have many error messages include task now. :p We may want to address them all when we have better name for it. :)
|
@mikebrow Replied comments. :) |
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM your call if you want to push this through now and do the TODO bringing in Michael's helper tomorrow or just wait for it.
Cheers.
|
No need to wait on it |
Fixes #316.
@abhi @mikebrow @kelseyhightower