Skip to content
This repository was archived by the owner on Mar 9, 2022. It is now read-only.
This repository was archived by the owner on Mar 9, 2022. It is now read-only.

Checkpoint and restart recovery #120

@Random-Liu

Description

@Random-Liu

There are several restart recovery problems with current cri-containerd:

  1. cri-containerd restart. Because cri-containerd maintains all internal state in-memory, including sandbox list, container list and image list, once restarted all state will be lost.
  2. containerd restart. When containerd restart and reconnect, there may be state mismatch between containerd and cri-containerd, e.g. a container dies during containerd is down.

To fix this, we should recover/reconcile state during cri-containerd start or after containerd restart and reconnect.

There are 3 kinds of internal state:

  1. Image list. Containerd has all the information we need, we just need to list images from containerd and recover the image list.
  2. Sandbox/container metadata: Most of the metadata is not provided by containerd, we need to checkpoint them for restart recovery. However, because metadata is constant, we could save it into containerd container label so as to leverage containerd metadata store to save it for us.
  3. Container status: Container status is not persisted by containerd, we need to persist it ourselves. And because it's constantly changing, we may not want to abuse containerd container label to save it. So we need to maintain its checkpoint ourselves.

/cc @kubernetes-incubator/maintainers-cri-containerd

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions