Pre-open container root directory#5190
Pre-open container root directory#5190kolyshkin wants to merge 4 commits intoopencontainers:mainfrom
Conversation
|
Thanks, this was actually on my list of libpathrs TODOs, and I'm super glad the patch turned out this minimal! 😄 My only real concern is that if we want to re-enable support for mounting on top of IIRC you also can't just do a basic re-open or open WDYT? We can probably just leave it as a comment for now though... |
|
I don't really know what to do about #5070, to me it seems like a very weird case scenario, but since runc used to support it, maybe we'll need to fix it eventually. I've added a comment. |
libcontainer/rootfs_linux.go
Outdated
There was a problem hiding this comment.
Maybe here we can use unix.Fchdir? And two other places:
diff --git a/libcontainer/rootfs_linux.go b/libcontainer/rootfs_linux.go
index 128fb2fa..ac3edfd1 100644
--- a/libcontainer/rootfs_linux.go
+++ b/libcontainer/rootfs_linux.go
@@ -204,7 +204,7 @@ func prepareRootfs(pipe *syncSocket, iConfig *initConfig) (err error) {
// container. It's just cleaner to do this here (at the expense of the
// operation not being perfectly split).
- if err := unix.Chdir(config.Rootfs); err != nil {
+ if err := unix.Fchdir(int(rootFd.Fd())); err != nil {
return &os.PathError{Op: "chdir", Path: config.Rootfs, Err: err}
}
@@ -217,9 +217,9 @@ func prepareRootfs(pipe *syncSocket, iConfig *initConfig) (err error) {
}
if config.NoPivotRoot {
- err = msMoveRoot(config.Rootfs)
+ err = msMoveRoot(rootFd)
} else if config.Namespaces.Contains(configs.NEWNS) {
- err = pivotRoot(config.Rootfs)
+ err = pivotRoot(rootFd)
} else {
err = chroot()
}
@@ -1125,7 +1125,7 @@ func setupPtmx(config *configs.Config) error {
// pivotRoot will call pivot_root such that rootfs becomes the new root
// filesystem, and everything else is cleaned up.
-func pivotRoot(rootfs string) error {
+func pivotRoot(root *os.File) error {
// While the documentation may claim otherwise, pivot_root(".", ".") is
// actually valid. What this results in is / being the new root but
// /proc/self/cwd being the old root. Since we can play around with the cwd
@@ -1138,15 +1138,9 @@ func pivotRoot(rootfs string) error {
}
defer unix.Close(oldroot)
- newroot, err := linux.Open(rootfs, unix.O_DIRECTORY|unix.O_RDONLY, 0)
- if err != nil {
- return err
- }
- defer unix.Close(newroot)
-
// Change to the new root so that the pivot_root actually acts on it.
- if err := unix.Fchdir(newroot); err != nil {
- return &os.PathError{Op: "fchdir", Path: "fd " + strconv.Itoa(newroot), Err: err}
+ if err := unix.Fchdir(int(root.Fd())); err != nil {
+ return &os.PathError{Op: "fchdir", Path: "fd " + strconv.Itoa(int(root.Fd())), Err: err}
}
if err := unix.PivotRoot(".", "."); err != nil {
@@ -1182,7 +1176,7 @@ func pivotRoot(rootfs string) error {
return nil
}
-func msMoveRoot(rootfs string) error {
+func msMoveRoot(root *os.File) error {
// Before we move the root and chroot we have to mask all "full" sysfs and
// procfs mounts which exist on the host. This is because while the kernel
// has protections against mounting procfs if it has masks, when using
@@ -1205,7 +1199,7 @@ func msMoveRoot(rootfs string) error {
// are non-full mounts or are inside the rootfs of the container.
if info.Root != "/" ||
(info.FSType != "proc" && info.FSType != "sysfs") ||
- strings.HasPrefix(info.Mountpoint, rootfs) {
+ strings.HasPrefix(info.Mountpoint, root.Name()) {
skip = true
}
return skip, stop
@@ -1239,7 +1233,9 @@ func msMoveRoot(rootfs string) error {
}
// Move the rootfs on top of "/" in our mount namespace.
- if err := mount(rootfs, "/", "", unix.MS_MOVE, ""); err != nil {
+ src, closer := utils.ProcThreadSelfFd(procRootIno)
+ defer closer()
+ if err := mount(src, "/", "", unix.MS_MOVE, ""); err != nil {
return err
}
return chroot()There was a problem hiding this comment.
src, closer := utils.ProcThreadSelfFd(procRootIno)
@lifubang I don't understand this part. You supply procRootIno (which is an inode number of a root directory on /proc mount) as a file descriptor (1), which does not make any sense.
There was a problem hiding this comment.
I guess you meant rootFd.Fd() here.
There was a problem hiding this comment.
@lifubang I've added your code as a second commit, co-authored by me. I've fixed the above issue and did some other changes. One big change is I decided to not use rootFd in msMoveRoot as in there it is only used for mount, and I don't think mounting via fd is any better.
There was a problem hiding this comment.
and I don't think mounting via fd is any better.
I still suggest to use a proc fd style path here, in theory, if the rootfs path contains a symlink that gets swapped between the time rootFd was opened and when MS_MOVE runs, we end up moving the wrong mount. Using /proc/thread-self/fd/N here closes that window consistently with the rest of the PR's approach. But maybe this would not happen in practice.
| if err != nil { | ||
| return fmt.Errorf("open rootfs handle: %w", err) | ||
| } | ||
| defer rootFd.Close() |
There was a problem hiding this comment.
What happens if the close fails and the open fd is leaked into the container? (I know we have this close all fds safe-guard, so it won't really leak). Could the container potentially use ".." to go to the host?
Shall we try to check the return code of close and return an error if we fail or something? Assuming the open fd is a way to go to the host fs. If on pivot_root the kernel does something smart because this is the root (I doubt?) and even if leaked is safe, then I wouldn't bother
There was a problem hiding this comment.
Yeah, you could use it to escape to the host -- and pivot_root(2) doesn't chroot file descriptor references unfortunately (or fortunately, if you care about performance).
Hmmm... Maybe it's better to punt this until we have switched to the new mount API? We could make this an OPEN_TREE_CLONE file descriptor instead which cannot be escaped that way (though I am not sure if this will work without OPEN_TREE_NAMESPACE -- pivot_root won't work the trivial way IIRC). 🤔
There was a problem hiding this comment.
Then again, runc also currently opens all sorts of files within the host mount namespace (to support the permission-less bind-mount stuff but also as a standard Go program), so this really isn't that big of a change and we need to rely on stuff being closed anyway. So it's probably fine...? 😞
There was a problem hiding this comment.
Yeah, you could use it to escape to the host -- and
pivot_root(2)doesn't chroot file descriptor references unfortunately
I’ve test it, it seems that pivot_root(2) DOES chroot file descriptor, so I said:
#5190 (comment)
There was a problem hiding this comment.
Ah, maybe it's not that it chroots it (i.e., the awful magic in chroot_fs_refs).
I think it's just that the fd refers to the bind-mount that becomes the filesystem root and so when that happens it just is the root.
In either case, neat.
| return fmt.Errorf("filesystem %q must be mounted on ordinary directory", m.Device) | ||
| } | ||
| dstFile, err := pathrs.MkdirAllInRoot(rootfs, dest, 0o755) | ||
| dstFile, err := pathrs.MkdirAllInRoot(c.root, dest, 0o755) |
There was a problem hiding this comment.
I didn't follow why this patch is needed for the next one, not what is trying to do. It changes the scope of the rootfs var, and it changes so we use c.root instead of the rootfs var?
There was a problem hiding this comment.
This is a slight code refactoring to ease the review process (if you're reviewing commit by commit). It is here to make the last commit smaller in size. Your description of what it does is correct.
I can certainly merge this if you prefer it this way.
cyphar
left a comment
There was a problem hiding this comment.
I thought about it a little more and still LGTM.
Indeed, it does not make sense to prepend c.root once we started using MkdirAllInRoot in commit 63c2908. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
No change in functionality, just a preparation for the next patch. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
11d6bae to
05bdb49
Compare
I've since added the 4th commit based on @lifubang suggestion above (and added O_PATH in the 3rd one). No other changes so you can just review the last commit. |
A lot of filesystem-related stuff happens inside the container root directory, and we have used its name before. It makes sense to pre-open it and use a *os.File handle instead. Function names in internal/pathrs are kept as is for simplicity (and it is an internal package), but they now accept root as *os.File. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This uses preopened rootfs in Chdir and pivotRoot. While at it, add O_PATH when opening oldroot in pivotRoot. Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
| // This function also creates missing mountpoints as long as they | ||
| // are not on top of a tmpfs, as CRIU will restore tmpfs content anyway. | ||
| func (c *Container) prepareCriuRestoreMounts(mounts []*configs.Mount) error { | ||
| rootFd, err := os.OpenFile(c.config.Rootfs, unix.O_DIRECTORY|unix.O_CLOEXEC|unix.O_PATH, 0) |
There was a problem hiding this comment.
(added O_PATH to here)
| // Pre-open rootfs. NOTE that if we need to re-enable support for mounting | ||
| // on top of container root (see issue 5070), we will need to reopen rootFd | ||
| // after such mounts. | ||
| rootFd, err := os.OpenFile(config.Rootfs, unix.O_DIRECTORY|unix.O_CLOEXEC|unix.O_PATH, 0) |
There was a problem hiding this comment.
(added O_PATH to here)
There was a problem hiding this comment.
Strictly speaking, O_DIRECTORY is actually fine as-is because directories aren't problematic (libpathrs does O_PATH for consistency more than anything else) but it's fine either way.
A lot of filesystem-related stuff happens inside the container root
directory, and we have used its name before. It makes sense to pre-open
it and use a *os.File handle instead.
Function names in internal/pathrs are kept as is for simplicity (and it
is an internal package), but they now accept root as *os.File.
This is a somewhat naive attempt at it, but it is surprisingly simple.