-
Notifications
You must be signed in to change notification settings - Fork 2k
[SH] add userfault support #5261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SH] add userfault support #5261
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/secret-hiding #5261 +/- ##
=========================================================
- Coverage 82.52% 81.83% -0.69%
=========================================================
Files 250 250
Lines 27386 27700 +314
=========================================================
+ Hits 22599 22668 +69
- Misses 4787 5032 +245
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
286efbe
to
4e10e54
Compare
b6185cb
to
60abeb9
Compare
d5e7aa8
to
40101cd
Compare
82f3312
to
a242c6f
Compare
ea14bbc
to
6ff118a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last round of comments :)
we'll need to keep track that at some point later we implement proper error handling instead of panic!-ing all over the place in Firecracker, but let's just get this into the feature branch to unblock everyone else and deal with that later
This is needed because if guest_memfd is used to back guest memory, vCPU fault notifications are delivered via the UFFD UDS socket. Signed-off-by: Nikita Kalyazin <[email protected]>
It is used by Secret-Free-enabled UFFD handlers to disable vCPU fault notifications from the kernel. Signed-off-by: Nikita Kalyazin <[email protected]>
Accept receiving 3 fds instead of 1, where fds[1] is guest_memfd and fds[2] is userfault bitmap memfd. Also handle the FaultRequest message over the UDS socket by calling a new callback in the Runtime and sending a FaultReply. Co-authored-by: Patrick Roy <[email protected]> Signed-off-by: Patrick Roy <[email protected]> Signed-off-by: Nikita Kalyazin <[email protected]>
There are two ways a UFFD handler receives a fault notification if Secret Fredom is enabled (which is inferred from 3 fds sent by Firecracker instead of 1): - a VMM- or KVM-triggered fault is delivered via a minor UFFD fault event. The handler is supposed to respond to it via memcpying the content of the page (if the page hasn't already been populated) followed by a UFFDIO_CONTINUE call. - a vCPU-triggered fault is delievered via a FaultRequest message on the UDS socket. The handler is supposed to reply with a pwrite64 call on the guest_memfd to populate the page followed by a FaultReply message on the UDS socket. In both cases, the handler also needs to clear the bit in the userfault bitmap at the corresponding offset in order to stop further fault notifications for the same page. UFFD handlers use the userfault bitmap for two purposes: - communicate to the kernel whether a fault at the corresponding guest_memfd offset will cause a VM exit - keep track of pages that have already been populated in order to avoid overwriting the content of the page that is already initialised. Signed-off-by: Nikita Kalyazin <[email protected]>
These are used for communication of page faults between Firecracker and a UFFD handler. Signed-off-by: Nikita Kalyazin <[email protected]>
If configured, userfault bitmap is registered with KVM and controls whether KVM will exit to userspace on a fault of the corresponding page. We are going to allocate the bitmap in a memfd in Firecracker, set bits for all pages to request notifications for vCPU faults and send it to the UFFD handler to delegate clearing the bits as pages get populated. Since the KVM userfault patches are still in review, set_user_memory_region2 is not aware of the userfault flag and the userfault bitmap address in its input structure. Define it in Firecracker code temporarily. Signed-off-by: Nikita Kalyazin <[email protected]>
This is needed to instruct the kernel to exit to userspace when a vCPU fault occurs and the corresponding bit in the userfault bitmap is set. The userfault bitmap is allocated in a memfd by Firecracker and sent to the UFFD handler. This also sends 3 fds to the UFFD handler in the handshake: - UFFD (original) - guest_memfd: for the handler to be able to populate guest memory - userfault bitmap memfd: for the handler to be able to disable exits to userspace for the pages that have already been populated Signed-off-by: Nikita Kalyazin <[email protected]>
This will be removed after upgrading to a new version of mmap support kernel patches. Signed-off-by: Nikita Kalyazin <[email protected]>
This is because vCPUs reason in GPAs while the secret-free UFFD protocol is guest_memfd-offset-based. Note that offset_to_gpa is not used yet, but will likely be needed to support async PF to pass the GPA to a new ioctl when notifying KVM of a fault resolution. Signed-off-by: Nikita Kalyazin <[email protected]>
It contains two parts: - external: between the VMM thread and the UFFD handler - internal: between vCPUs and the VMM thread An outline of the workflow: - When a vCPU fault occurs, vCPU exits to userspace - The vCPU thread sends sends the exit syndrome in the vCPU to VMM channel and writes to the eventfd - The VMM thread forwards the syndrome to the UFFD handler via the UDS socket - The UFFD handler populates the page, clears the corresponding bit in the userfault bitmap and sends a reply to Firecracker - The VMM thread receives the reply and updates a vCPU condvar to notify the vCPU that the fault has been resolved - The vCPU resumes execution Note that as a result of this change, an ability to exit the VM gracefully is lost (at least on x86). In the existing implementation, the VMM thread initiated an exit if an event was read from the eventfd, but no VcpuResponse::Exited responses were read for unknown reason. Since the exit_evt eventfd is now also used by vCPUs to notify the VMM thread of the VM exits caused by pagefaults, this situation (an eventfd event, but response in the channel) can occur also because we have read all VcpuResponse::Userfault in response to the previous eventfd event. Signed-off-by: Nikita Kalyazin <[email protected]>
kvmclock is currently not supported by Secret Freedom and calling kvmclock_ctrl will always fail. Signed-off-by: Nikita Kalyazin <[email protected]>
In a regular VM, we mmap the memory snapshot file and supply the address in the KVM memory slot. In Secret Free VMs, we provide guest_memfd in the memory slot instead. There is no way we can restore a Secret Free VM from a file, unless we prepopulate the guest_memfd with the file content, which is inefficient and is not practically useful. Signed-off-by: Nikita Kalyazin <[email protected]>
It is not supported by Secret Freedom. Signed-off-by: Nikita Kalyazin <[email protected]>
This includes both functional and performance tests. Signed-off-by: Nikita Kalyazin <[email protected]>
Do not add a balloon device to a Secret Free VM as it is not currently supported. Signed-off-by: Nikita Kalyazin <[email protected]>
When taking a snapshot from a Secret Free VM, we create a bounce buffer to be able to pass it to the host kernel to store in a file. Exclude it from the memory monitor calculation. Signed-off-by: Nikita Kalyazin <[email protected]>
This is because the error type has changed due the implementation of snapshot restore support for Secret Free VMs. Signed-off-by: Nikita Kalyazin <[email protected]>
Graceful shutdown is currently broken on x86_64. Signed-off-by: Nikita Kalyazin <[email protected]>
a487771
into
firecracker-microvm:feature/secret-hiding
Changes
Implement userfault support in Secret Freedom. The goal of this change is to be able to resume Secret-Free VMs via UFFD.
Major changes:
write
s to the guest_memfd to populate guest pages and clears bits in the userfault bitmap (memfd) to stop KVM from sending vCPU fault notificationsReason
This is needed to be able to restore snapshots where the VM was backed by guest_memfd.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
[ ] I have updated any relevant documentation (both in code and in the docs)in the PR.
[ ] I have mentioned all user-facing changes inCHANGELOG.md
.[ ] If a specific issue led to this PR, this PR closes the issue.[ ] When making API changes, I have followed theRunbook for Firecracker API changes.
[ ] I have tested all new and changed functionalities in unit tests and/orintegration tests.
[ ] I have linked an issue to every newTODO
.rust-vmm
.