-
Notifications
You must be signed in to change notification settings - Fork 86
Identity Plugin RFC #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Identity Plugin RFC #245
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,198 @@ | ||
| - Created: | ||
| - Current Version: | ||
| - Status: WIP | ||
| - Contributors: Harshit Gupta ([email protected], [email protected]), Michael Brown ([email protected]) | ||
| - Approvers: | ||
|
|
||
| # Summary | ||
|
|
||
| The identity plugin, running as a daemonset, fetches identity artifacts for each annotated container and mounts the artifact volume into the container. | ||
|
|
||
| # Motivation | ||
|
|
||
| The current way how we mount identity artifacts into a container is to use a initContainer / sidecar. There is a sidecar for each pod. This sidecar fetches the artifacts from identity providers, injects the volume containing identity artifacts into the pod and mounts the volume into the container. Using a sidecar in each pod for fetching identity artifacts is too resource intensive. Using a initContainer sidecar comes with an additional overhead of increased container start latency. The motivation for using a NRI Identity Plugin is that only one process is enough to fetch the identity artifacts for all of the annotated containers in all the pods on a single host. Therefore the identity PLUGIN as a daemonset can fetch the identity artifacts for all the containers in all the pods on all thee hosts in the cluster that have the appropriate annotations. Running as a single process per host instead of single process per pod we save on host resources. | ||
|
|
||
| # Proposed Implementation | ||
|
|
||
| ## Setup Steps | ||
|
|
||
| 1. Create a static spiffe-id for the plugin. | ||
|
|
||
| ``` | ||
| spire-server entry create \ | ||
| -spiffeID spiffe://example.org/nri/identity-injector \ | ||
| -selector k8s:ns:backend \ | ||
| -selector k8s:sa:db-writer | ||
| ``` | ||
|
|
||
| PENDING - what selector's will the plugin have? | ||
|
|
||
|
|
||
| 2. https://spiffe.io/docs/latest/deploying/spire_agent/#delegated-identity-api Configure the Plugin's Spiffe ID in the Agent configuration. | ||
|
|
||
| ``` | ||
| agent { | ||
| trust_domain = "example.org" | ||
| ... | ||
| admin_socket_path = "/tmp/spire-agent/private/admin.sock" | ||
| authorized_delegates = [ | ||
| "spiffe://example.org/nri/identity-injector" | ||
| ] | ||
| } | ||
|
|
||
| ``` | ||
|
|
||
|
|
||
| ## How the plugin will work | ||
|
|
||
| 1. Plugin registers itself to the agent and get its spiffe id. | ||
|
|
||
| 2. Containerd calls the NRI plugin on CreateContainer event. The plugin receives the pod and container metadata. Using this metadata, the plugin executes step 3 below. The plugin has to ensure that the pid is a stable identifier, that the pid is not a recycled identifier. This is a platform dependent step. (e.g. by using pidfds on Linux). https://spiffe.io/docs/latest/deploying/spire_agent/#delegated-identity-api | ||
|
|
||
| 3. The plugin reads the podspec and parses the annotations. Using spire-api-sdk, the plugin fetches identity artifacts and mounts the volume containing the identity artifacts into the container. The spire-api-sdk streams artifact updates, the plugin will on receiving updates will overwrite the artifact on the host volume. Since the host volume is mounted into the container, the artifacts in the container are automatically updated. The plugin does not need a separate mount annotation in the podspec yaml. The daemonset specifies the volume that the plugin will use to fetch and download the identity artifacts. The plugin will then mount the host volume to the container volume using the cert_dir from the podspec. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this assumes the annotations can not be mutated after the pod has been created, right?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will be also nice to use the native pod certificates feature https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/4317-pod-certificates instead of depending on annotations
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks for the review comments. Correct, I am operating under the assumption that annotations will not be mutated after a Pod is created. Do you have a specific use case in mind where those annotations might be modified? Looking into the KEP.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks!! Makes sense to use PodCertificate Projected Volume Sources.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
then I suggest you to make it explicit in the doc, people tend to do a lot of things and someone may get the idea of that being supported
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nod annotations should be a test platform only.. and would nec. be expected to be replaced with actual fields.. if feature based NRI plugins are to be formalized. We could also check for pod updated annotations as we are adding pod updates to NRI.. |
||
|
|
||
|
|
||
|
|
||
| ## DaemonSet yaml | ||
|
|
||
| ``` | ||
| apiVersion: apps/v1 | ||
| kind: DaemonSet | ||
| metadata: | ||
| name: nri-plugin-identity-injector | ||
| spec: | ||
| template: | ||
| spec: | ||
| priorityClassName: system-node-critical | ||
| containers: | ||
| - name: plugin | ||
| image: plugin:latest | ||
| args: | ||
| - "-idx" | ||
| - "10" | ||
| resources: | ||
| requests: | ||
| cpu: "2m" | ||
| memory: "5Mi" | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| volumeMounts: | ||
| - name: nri-socket | ||
| mountPath: /var/run/nri/nri.sock | ||
| - name: identity-artifacts # Artifact volume | ||
| mountPath: /var/identity/ | ||
| volumes: | ||
| - name: nri-socket | ||
| hostPath: | ||
| path: /var/run/nri/nri.sock | ||
| type: Socket | ||
| - name: identity-artifacts # Artifact volume | ||
| hostPath: | ||
| path: /var/identity/ | ||
| type: Directory | ||
|
|
||
| ``` | ||
|
|
||
|
|
||
|
|
||
| ## Podspec example | ||
|
|
||
| ``` | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: bbid0 | ||
| labels: | ||
| app: bbid0 | ||
| annotations: | ||
| identity.noderesource.dev/container.c0: |+ | ||
| - spire_agent_address: /tmp/agent.sock | ||
| cert_dir: /var/certs | ||
| svid_file_name: svid.pem | ||
| svid_key_file_name: svid_key.pem | ||
| svid_bundle_file_name: svid_bundle.pem | ||
| identity.noderesource.dev/container.c1: |+ | ||
| - spire_agent_address: /tmp/agent.sock | ||
| cert_dir: /var/certs | ||
| svid_key_file_name: svid_key.pem | ||
| svid_bundle_file_name: svid_bundle.pem | ||
| spec: | ||
| containers: | ||
| - name: c0 | ||
| image: busybox | ||
| imagePullPolicy: IfNotPresent | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| if [ -f /var/certs/svid.pem ]; then | ||
| echo "svid exists!" | ||
| else | ||
| echo "svid does NOT exist." | ||
| fi | ||
| sleep inf | ||
| resources: | ||
| requests: | ||
| cpu: 500m | ||
| memory: '100M' | ||
| limits: | ||
| cpu: 500m | ||
| memory: '100M' | ||
| - name: c1 | ||
| image: busybox | ||
| imagePullPolicy: IfNotPresent | ||
| command: | ||
| - sh | ||
| - | | ||
| if [ -f /var/certs/svid.pem ]; then | ||
| echo "svid exists!" | ||
| else | ||
| echo "svid does NOT exist." | ||
| fi | ||
| sleep inf | ||
| resources: | ||
| requests: | ||
| cpu: 1 | ||
| memory: '100M' | ||
| limits: | ||
| cpu: 1 | ||
| memory: '100M' | ||
| - name: c2 | ||
| image: busybox | ||
| imagePullPolicy: IfNotPresent | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| if [ -f /var/certs/svid.pem ]; then | ||
| echo "svid exists!" | ||
| else | ||
| echo "svid does NOT exist." | ||
| fi | ||
| sleep inf | ||
| resources: | ||
| requests: | ||
| cpu: 1 | ||
| memory: '100M' | ||
| limits: | ||
| cpu: 1 | ||
| memory: '100M' | ||
| terminationGracePeriodSeconds: 1 | ||
|
|
||
| ``` | ||
|
|
||
| ## Handling Openshift / cri-o | ||
|
|
||
| Openshift / cri-o specific challenges that the plugin resolves. | ||
|
|
||
| ## Handling VMs | ||
|
|
||
|
|
||
| # Alternatives | ||
|
|
||
| An alternative to identity plugin is to use sidecar containers or initContainers in every pod for fetching the artifacts and mounting the artifact volume. | ||
|
|
||
| # Open Questions | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have a phase error problem here if we already need to know the container's PID at this point ? You can't really obtain a container's PID before the container has been started. But that is only going to happen later, triggered by a
StartContainerCRI call, for which you'll then get corresponding{Start,PostStart}ContainerNRI calls.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to get the task pid which is created by the task create in start container before we call nri start container call.. also before the task.Start() is run.. and also before the nri poststart.. should be doable, no? https://github.com/containerd/containerd/blob/main/internal/cri/server/container_start.go#L217-L251
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in
StartContainerwe do pass that pid already to plugins.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I need to switch from CreateContainer to StartContainer. CreateContainer (or PostCreateContainer for that matter) has no task, therefore no task pid.
Almost. We have the pid in StartContainer but we don't update the container with the pid at the time of calling nri. Line 251 https://github.com/containerd/containerd/blob/main/internal/cri/server/container_start.go#L251 is where we already call nri but only later on line 263 https://github.com/containerd/containerd/blob/main/internal/cri/server/container_start.go#L263 is where we update the container status with the pid. I will look into how to make this change so that we have the pid when calling nri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine for auxiliary container-related actions which do not need to modify the container('s metadata/parameters) itself. But if there is a need, any container modifications you'll have to do during container creation. Once a container is created it is immutable apart from (compute) resources (controlled via cgroupfs). And I think also lately runtime-level annotations, but even if that's possible now via CRI, it's currently not via NRI after container creation.
Hmm, but this is not what I see when I test this in practice, just using the nri logger plugin and a test pod.
So to me it looks like we do get the container's process's PID in StartContainer.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Krisztian!! Also finally found the code where we set the pid https://github.com/containerd/containerd/blob/main/internal/cri/nri/nri_api_linux.go#L700.
Do you mean we cannot mount any new volumes in container start (after container create)? Alternative could be we mount the volume in container create but only write the artifacts in container start when we have the pid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh there is also a pid in CreateContainer. Looking into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pid in the create container call is the pod's pid, which could probably be useful for namespaces shared over the pod to all containers..
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you need to pre-mount a directory during container creation where you can then later write artifacts or attach new mounts as subdirectories.
Yes, exactly like that.