Skip to content

runc-dmz: Inheritable capabilities are dropped when they previously weren't #4125

@dgl

Description

@dgl

Description

runc-dmz results in a change in capabilities behaviour, for non-root users. Previously if a binary had file capabilities it would inherit those, if it was the first execve in the container. It turns out this worked as many people desired, if they didn't intend, as the service running in the container would get the ability to bind low ports.

This happens when ambient capabilities aren't used. Note Kubernetes does not set ambient capabilities currently, there is a KEP for this: kubernetes/enhancements#2763 but this is a change in observable runc behaviour.

Steps to reproduce the issue

I spotted this on a Kubernetes cluster using runc from main as CoreDNS wasn't starting successfully (CoreDNS >= v1.11 runs as non-root, which is in Kubernetes 1.29 or greater, depending exactly how the cluster is created).

One way to do that is:

  1. Update kind's base image to use runc from main (edit images/base/Dockerfile ARG RUNC_VERSION="main")
  2. Build that (make quick in the directory)
  3. kind build node-image ~/Code/kubernetes --image kindest/node:runc-main --base-image=gcr.io/k8s-staging-kind/base:v20231124-6a461ab5-dirty
  4. Build a cluster with it:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:runc-main

However it can be reduced to running a runc container where the args point to something with setcap.

$ runc spec
[edit config.json to look something like this at the top:

{
        "ociVersion": "1.1.0+dev",
        "process": {
                "terminal": true,
                "user": {
                        "uid": 1000,
                        "gid": 1000
                },
                "args": [
                        "/usr/bin/nc.openbsd", "-l", "80"
                ],
                "env": [
                        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                        "TERM=xterm"
                ],
                "cwd": "/",
                "capabilities": {
                        "bounding": [
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "effective": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "permitted": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ]
                },

(Basically run nc.openbsd attempting to listen on a <1024 port but drop the ambient capabilities.)

Make sure that file exists (netcat-openbsd in a Debian/Ubuntu rootfs works) and has file capabilities:

$ sudo setcap cap_net_bind_service=+ep rootfs/usr/bin/nc.openbsd
$ sudo getcap rootfs/usr/bin/nc.openbsd          
rootfs/usr/bin/nc.openbsd cap_net_bind_service=ep

Describe the results you received and expected

Binary runs and is able to listen on <1024 port. Instead CoreDNS/other binary gives permission denied on bind:

$ sudo ./runc run config.json                    
nc.openbsd: Permission denied
$ sudo env RUNC_DMZ=legacy ./runc run config.json
[works as expected]

What version of runc are you using?

# runc --version
runc version 1.1.0+dev
commit: v1.1.0-855-g95a93c1
spec: 1.1.0+dev
go: go1.20.4
libseccomp: 2.5.4

Host OS information

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"

Host kernel information

Linux 6.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions