Pick required FIPS changes + Add FIPS RNG performance overhaul #372

kerneltoast · 2025-06-25T23:59:06Z

This adds the missing baseline FIPS changes ported over by @jallisonciq, in addition to my new FIPS RNG performance overhaul.

Using the kernel crypto API, the SHA3-256 algorithm is used as conditioning element to replace the LFSR in the Jitter RNG. All other parts of the Jitter RNG are unchanged. The application and use of the SHA-3 conditioning operation is identical to the user space Jitter RNG 3.4.0 by applying the following concept: - the Jitter RNG initializes a SHA-3 state which acts as the "entropy pool" when the Jitter RNG is allocated. - When a new time delta is obtained, it is inserted into the "entropy pool" with a SHA-3 update operation. Note, this operation in most of the cases is a simple memcpy() onto the SHA-3 stack. - To cause a true SHA-3 operation for each time delta operation, a second SHA-3 operation is performed hashing Jitter RNG status information. The final message digest is also inserted into the "entropy pool" with a SHA-3 update operation. Yet, this data is not considered to provide any entropy, but it shall stir the entropy pool. - To generate a random number, a SHA-3 final operation is performed to calculate a message digest followed by an immediate SHA-3 init to re-initialize the "entropy pool". The obtained message digest is one block of the Jitter RNG that is returned to the caller. Mathematically speaking, the random number generated by the Jitter RNG is: aux_t = SHA-3(Jitter RNG state data) Jitter RNG block = SHA-3(time_i || aux_i || time_(i-1) || aux_(i-1) || ... || time_(i-255) || aux_(i-255)) when assuming that the OSR = 1, i.e. the default value. This operation implies that the Jitter RNG has an output-blocksize of 256 bits instead of the 64 bits of the LFSR-based Jitter RNG that is replaced with this patch. The patch also replaces the varying number of invocations of the conditioning function with one fixed number of invocations. The use of the conditioning function consistent with the userspace Jitter RNG library version 3.4.0. The code is tested with a system that exhibited the least amount of entropy generated by the Jitter RNG: the SiFive Unmatched RISC-V system. The measured entropy rate is well above the heuristically implied entropy value of 1 bit of entropy per time delta. On all other tested systems, the measured entropy rate is even higher by orders of magnitude. The measurement was performed using updated tooling provided with the user space Jitter RNG library test framework. The performance of the Jitter RNG with this patch is about en par with the performance of the Jitter RNG without the patch. Signed-off-by: Stephan Mueller <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Back-port of commit bb897c5 Author: Stephan Müller <[email protected]> Date: Fri Apr 21 08:08:04 2023 +0200 Signed-off-by: Jeremy Allison <[email protected]>

I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key. Signed-off-by: Hailey Mothershead <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Back-ported from commit 23e4099 Author: Hailey Mothershead <[email protected]> Date: Mon Apr 15 22:19:15 2024 +0000 Signed-off-by: Jeremy Allison <[email protected]>

Signed-off-by: Jeremy Allison <[email protected]>

private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key. However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first. Note that this patch slightly changes the behavior of this function: previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails. Signed-off-by: Joachim Vandersmissen <[email protected]> Signed-off-by: Herbert Xu <[email protected]>

[ Upstream commit ba3c557 ] When the mpi_ec_ctx structure is initialized, some fields are not cleared, causing a crash when referencing the field when the structure was released. Initially, this issue was ignored because memory for mpi_ec_ctx is allocated with the __GFP_ZERO flag. For example, this error will be triggered when calculating the Za value for SM2 separately. Fixes: d58bb7e ("lib/mpi: Introduce ec implementation to MPI library") Cc: [email protected] # v6.5 Signed-off-by: Tianjia Zhang <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

…ey() to zeroize keys on exit. converting ws

None of the ciphers used by the DRBG have an alignment requirement; thus, they all return 0 from .crypto_init, resulting in inconsistent alignment across all buffers. Align all buffers to at least a cache line to improve performance. This is especially useful when multiple DRBG instances are used, since it prevents false sharing of cache lines between the different instances. Signed-off-by: Sultan Alsawaf <[email protected]>

Like pin_user_pages_fast(), but with the internal-only FOLL_FAST_ONLY flag. This complements the get_user_pages*() API, which already has get_user_pages_fast_only(). Signed-off-by: Sultan Alsawaf <[email protected]>

There is no reason this refcount should be a signed int. Convert it to an unsigned int, thereby also making it less likely to ever overflow. Signed-off-by: Sultan Alsawaf <[email protected]>

Since crypto_devrandom_read_iter() is invoked directly by user tasks and is accessible by every task in the system, there are glaring priority inversions on crypto_reseed_rng_lock and crypto_default_rng_lock. Tasks of arbitrary scheduling priority access crypto_devrandom_read_iter(). When a low-priority task owns one of the mutex locks, higher-priority tasks waiting on that mutex lock are stalled until the low-priority task is done. Fix the priority inversions by converting the mutex locks into rt_mutex locks which have PI support. Signed-off-by: Sultan Alsawaf <[email protected]>

kerneltoast · 2025-06-26T00:08:19Z

Please closely review the commits I authored in this PR, since they are new original works.

PlaidCat · 2025-06-26T13:58:03Z

Do you have any testing you can share in the PR header?

kerneltoast · 2025-06-26T22:15:19Z

Do you have any testing you can share in the PR header?

I can't think of a great way to tabulate the test methodology for this, but what I tested was:

vm.overcommit_memory=2 (for full overcommit to trigger failure of the optimistic page pin attempt)
An extremely large request of 250MB in a single getrandom() call without GRND_RANDOM
Request sizes less than one page, equal to one page, and greater than one page, with and without GRND_RANDOM

I printk'd all the edge cases to verify that they were hit and that they worked correctly, too.

jallisonciq

I worked with Sultan whilst he was creating these patches (and I may be the reason for some of the comments :-). I am completely happy with the code logic, but as I told him I'm going to have to trust him on the locking design.

bmastbergen

I won't pretend to have a full understanding of what is going on here, but from the commit messages and comments I think I get the gist of it. With that being said, LGTM 🥌

When the kernel is booted with fips=1, the RNG exposed to userspace is hijacked away from the CRNG and redirects to crypto_devrandom_read_iter(), which utilizes the DRBG. Notably, crypto_devrandom_read_iter() maintains just two global DRBG instances _for the entire system_, and the two instances serve separate request types: one instance for GRND_RANDOM requests (crypto_reseed_rng), and one instance for non-GRND_RANDOM requests (crypto_default_rng). So in essence, for requests of a single type, there is just one global RNG for all CPUs in the entire system, which scales _very_ poorly. To make matters worse, the temporary buffer used to ferry data between the DRBG and userspace is woefully small at only 256 bytes, which doesn't do a good job of maximizing throughput from the DRBG. This results in lost performance when userspace requests >256 bytes; it is observed that DRBG throughput improves by 70% on an i9-13900H when the buffer size is increased to 4096 bytes (one page). Going beyond the size of one page up to the DRBG maximum request limit of 65536 bytes produces diminishing returns of only 3% improved throughput in comparison. And going below the size of one page produces progressively less throughput at each power of 2: there's a 5% loss going from 4096 bytes to 2048 bytes and a 9% loss going from 2048 bytes to 1024 bytes. Thus, this implements per-CPU DRBG instances utilizing a page-sized buffer for each CPU to utilize the DRBG itself more effectively. On top of that, for non-GRND_RANDOM requests, the DRBG's operations now occur under a local lock that disables preemption on non-PREEMPT_RT kernels, which not only keeps each CPU's DRBG instance isolated from another, but also improves temporal cache locality while the DRBG actively generates a new string of random bytes. Prefaulting one user destination page at a time is also employed to prevent a DRBG instance from getting blocked on page faults, thereby maximizing the use of the DRBG so that the only bottleneck is the DRBG itself. Signed-off-by: Sultan Alsawaf <[email protected]>

kerneltoast · 2025-06-28T02:09:19Z

FYI, I added a few more small comments and micro-optimized 2 lines of code. Going to merge it since it's not any different from what it was before. Thanks for the reviews 🙂

jallisonciq and others added 10 commits June 25, 2025 16:53

SUSE: patch: crypto-ecdh-implement-FIPS-PCT.patch

9e1a641

Signed-off-by: Jeremy Allison <[email protected]>

In essiv_aead_setkey(), use the same logic as crypto_authenc_esn_setk…

3c8c29a

…ey() to zeroize keys on exit. converting ws

mm/gup: introduce pin_user_pages_fast_only()

e1a3205

Like pin_user_pages_fast(), but with the internal-only FOLL_FAST_ONLY flag. This complements the get_user_pages*() API, which already has get_user_pages_fast_only(). Signed-off-by: Sultan Alsawaf <[email protected]>

crypto: rng - Convert crypto_default_rng_refcnt into an unsigned int

59ab062

There is no reason this refcount should be a signed int. Convert it to an unsigned int, thereby also making it less likely to ever overflow. Signed-off-by: Sultan Alsawaf <[email protected]>

kerneltoast requested review from jallisonciq, PlaidCat, bmastbergen, jason-rodri and solardiz June 25, 2025 23:59

jallisonciq approved these changes Jun 26, 2025

View reviewed changes

bmastbergen approved these changes Jun 27, 2025

View reviewed changes

kerneltoast force-pushed the {sultan}_fips-rng_rocky9_6 branch from d1bd048 to 472f628 Compare June 28, 2025 02:06

kerneltoast merged commit 472f628 into fips-9-compliant/5.14.0-570.18.1.el9_6 Jun 28, 2025
2 of 4 checks passed

kerneltoast deleted the {sultan}_fips-rng_rocky9_6 branch June 28, 2025 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pick required FIPS changes + Add FIPS RNG performance overhaul #372

Pick required FIPS changes + Add FIPS RNG performance overhaul #372

Uh oh!

kerneltoast commented Jun 25, 2025

Uh oh!

kerneltoast commented Jun 26, 2025

Uh oh!

PlaidCat commented Jun 26, 2025

Uh oh!

kerneltoast commented Jun 26, 2025

Uh oh!

jallisonciq left a comment

Uh oh!

bmastbergen left a comment

Uh oh!

kerneltoast commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

Pick required FIPS changes + Add FIPS RNG performance overhaul #372

Pick required FIPS changes + Add FIPS RNG performance overhaul #372

Uh oh!

Conversation

kerneltoast commented Jun 25, 2025

Uh oh!

kerneltoast commented Jun 26, 2025

Uh oh!

PlaidCat commented Jun 26, 2025

Uh oh!

kerneltoast commented Jun 26, 2025

Uh oh!

jallisonciq left a comment

Choose a reason for hiding this comment

Uh oh!

bmastbergen left a comment

Choose a reason for hiding this comment

Uh oh!

kerneltoast commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!