libcontainer: map pre-exec PID 1 signals to 128+signal#5197
libcontainer: map pre-exec PID 1 signals to 128+signal#5197lifubang wants to merge 1 commit intoopencontainers:mainfrom
Conversation
7af6e21 to
948957a
Compare
69adecd to
6eb7f1c
Compare
There is a narrow pre-exec window spanning the exec.fifo handshake and the final execve in which the Go-based runc init helper is still the container's PID 1. If SIGTERM, SIGINT, or SIGHUP arrives in that window, Linux does not apply the default terminating action because PID 1 is special. The Go runtime signal path assumes the kernel will finish that work for terminating signals and calls dieFromSignal on that basis; see: https://github.com/golang/go/blob/c60392da/src/runtime/signal_unix.go#L993 For runc's PID 1 helper, that mismatch leaks Go's internal exit status 2 instead of the usual shell-style 128+signal. Install a narrow pre-exec signal handler for those signals while the helper is PID 1, and translate them to 128+signal until execve replaces the helper with the container payload. Add libcontainer integration coverage for the regression. The test uses a StartContainer hook to hold the process in the post-fifo, pre-exec window, signals init through the libcontainer API, and verifies the resulting exit status for SIGTERM, SIGINT, and SIGHUP. Signed-off-by: Davanum Srinivas <davanum@gmail.com> Signed-off-by: lifubang <lifubang@acmcoder.com>
6eb7f1c to
d7e5978
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses a PID 1 pre-exec edge case where termination signals (SIGTERM/SIGINT/SIGHUP) can yield Go’s internal exit status instead of the conventional 128+signal, by installing a narrow signal handler in the init helper and adding an integration test that reproduces the race window.
Changes:
- Add a pre-exec PID 1 signal handler in
libcontainerinit to exit with128+signofor SIGTERM/SIGINT/SIGHUP. - Add an integration test that blocks in a
StartContainerhook, signals init via the libcontainer API, and asserts the mapped exit codes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| libcontainer/init_linux.go | Installs a pre-exec signal handler (PID 1 only) to translate termination signals into conventional 128+signal exit codes. |
| libcontainer/integration/preexec_signal_test.go | Adds integration coverage that holds init in the post-fifo, pre-exec window and validates exit status mapping for SIGTERM/SIGINT/SIGHUP. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fd, err := unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0) | ||
| if errors.Is(err, unix.ENXIO) || errors.Is(err, os.ErrNotExist) { | ||
| return | ||
| } | ||
| ok(t, err) |
There was a problem hiding this comment.
releaseHook treats ENXIO from opening the FIFO as a no-op return. ENXIO can also occur if the hook hasn't opened the FIFO for reading yet (a race even after hookReady is created), which can leave the StartContainer hook blocked and make this test flaky/hang. Consider retrying on ENXIO until a short deadline (or change the hook command so the FIFO is opened before writing the ready file).
| fd, err := unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0) | |
| if errors.Is(err, unix.ENXIO) || errors.Is(err, os.ErrNotExist) { | |
| return | |
| } | |
| ok(t, err) | |
| openDeadline := time.Now().Add(500 * time.Millisecond) | |
| var ( | |
| fd int | |
| err error | |
| ) | |
| for { | |
| fd, err = unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0) | |
| if errors.Is(err, os.ErrNotExist) { | |
| // FIFO does not exist (any more) — nothing to release. | |
| return | |
| } | |
| if errors.Is(err, unix.ENXIO) { | |
| // Reader (hook) has not opened the FIFO yet; retry for a short time. | |
| if time.Now().After(openDeadline) { | |
| t.Fatalf("timed out waiting to open %s for writing: %v", path, err) | |
| } | |
| time.Sleep(10 * time.Millisecond) | |
| continue | |
| } | |
| ok(t, err) | |
| break | |
| } |
| case <-time.After(500 * time.Millisecond): | ||
| releaseHook(t, hookRelease) | ||
| t.Fatal("process did not exit while startContainer hook was still blocking") | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
The 500ms timeout here (and the 500ms deadline used inside releaseHook) seems quite tight for an integration test and may cause flakes under CI load. Consider increasing these timeouts (e.g., a few seconds, consistent with other integration tests) while still asserting that the init exits before the hook is released.
|
@kolyshkin is this ok for us to land? |
we're still discussing whether to fix this in golang; if yes this might not be necessary long-term |
@kolyshkin is there a public issue i can track in golang? thanks! |
|
It looks like this will land in the next Go version, but won't be backported (as it's not severe enough, and the fix can be seen more like an improvement than a bug fix). Since
Speaking of k8s tests, please note that another fix might be needed once the fixed golang is used. |
|
This is now fixed in the future version of Go (1.27), but won't be backported (I can try opening backport requests but I'm pretty sure it won't pass the backport criteria). So, this will be fixed in runc in about a year or so. I think we can close this one. Let me know if you disagree. |
|
Yes, I think it can be closed now. |
Test #5189
There is a narrow pre-exec window spanning the exec.fifo handshake and the final execve in which the Go-based runc init helper is still the container's PID 1.
If SIGTERM, SIGINT, or SIGHUP arrives in that window, Linux does not apply the default terminating action because PID 1 is special. The Go runtime signal path assumes the kernel will finish that work for terminating signals and calls dieFromSignal on that basis; see: https://github.com/golang/go/blob/c60392da/src/runtime/signal_unix.go#L993
For runc's PID 1 helper, that mismatch leaks Go's internal exit status 2 instead of the usual shell-style 128+signal.
Install a narrow pre-exec signal handler for those signals while the helper is PID 1, and translate them to 128+signal until execve replaces the helper with the container payload.
Add libcontainer integration coverage for the regression. The test uses a StartContainer hook to hold the process in the post-fifo, pre-exec window, signals init through the libcontainer API, and verifies the resulting exit status for SIGTERM, SIGINT, and SIGHUP.