-
Notifications
You must be signed in to change notification settings - Fork 11
Pick required FIPS changes + Add FIPS RNG performance overhaul #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pick required FIPS changes + Add FIPS RNG performance overhaul #372
Conversation
Using the kernel crypto API, the SHA3-256 algorithm is used as conditioning element to replace the LFSR in the Jitter RNG. All other parts of the Jitter RNG are unchanged. The application and use of the SHA-3 conditioning operation is identical to the user space Jitter RNG 3.4.0 by applying the following concept: - the Jitter RNG initializes a SHA-3 state which acts as the "entropy pool" when the Jitter RNG is allocated. - When a new time delta is obtained, it is inserted into the "entropy pool" with a SHA-3 update operation. Note, this operation in most of the cases is a simple memcpy() onto the SHA-3 stack. - To cause a true SHA-3 operation for each time delta operation, a second SHA-3 operation is performed hashing Jitter RNG status information. The final message digest is also inserted into the "entropy pool" with a SHA-3 update operation. Yet, this data is not considered to provide any entropy, but it shall stir the entropy pool. - To generate a random number, a SHA-3 final operation is performed to calculate a message digest followed by an immediate SHA-3 init to re-initialize the "entropy pool". The obtained message digest is one block of the Jitter RNG that is returned to the caller. Mathematically speaking, the random number generated by the Jitter RNG is: aux_t = SHA-3(Jitter RNG state data) Jitter RNG block = SHA-3(time_i || aux_i || time_(i-1) || aux_(i-1) || ... || time_(i-255) || aux_(i-255)) when assuming that the OSR = 1, i.e. the default value. This operation implies that the Jitter RNG has an output-blocksize of 256 bits instead of the 64 bits of the LFSR-based Jitter RNG that is replaced with this patch. The patch also replaces the varying number of invocations of the conditioning function with one fixed number of invocations. The use of the conditioning function consistent with the userspace Jitter RNG library version 3.4.0. The code is tested with a system that exhibited the least amount of entropy generated by the Jitter RNG: the SiFive Unmatched RISC-V system. The measured entropy rate is well above the heuristically implied entropy value of 1 bit of entropy per time delta. On all other tested systems, the measured entropy rate is even higher by orders of magnitude. The measurement was performed using updated tooling provided with the user space Jitter RNG library test framework. The performance of the Jitter RNG with this patch is about en par with the performance of the Jitter RNG without the patch. Signed-off-by: Stephan Mueller <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Back-port of commit bb897c5 Author: Stephan Müller <[email protected]> Date: Fri Apr 21 08:08:04 2023 +0200 Signed-off-by: Jeremy Allison <[email protected]>
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key. Signed-off-by: Hailey Mothershead <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Back-ported from commit 23e4099 Author: Hailey Mothershead <[email protected]> Date: Mon Apr 15 22:19:15 2024 +0000 Signed-off-by: Jeremy Allison <[email protected]>
Signed-off-by: Jeremy Allison <[email protected]>
private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key. However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first. Note that this patch slightly changes the behavior of this function: previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails. Signed-off-by: Joachim Vandersmissen <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
[ Upstream commit ba3c557 ] When the mpi_ec_ctx structure is initialized, some fields are not cleared, causing a crash when referencing the field when the structure was released. Initially, this issue was ignored because memory for mpi_ec_ctx is allocated with the __GFP_ZERO flag. For example, this error will be triggered when calculating the Za value for SM2 separately. Fixes: d58bb7e ("lib/mpi: Introduce ec implementation to MPI library") Cc: [email protected] # v6.5 Signed-off-by: Tianjia Zhang <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…ey() to zeroize keys on exit. converting ws
None of the ciphers used by the DRBG have an alignment requirement; thus, they all return 0 from .crypto_init, resulting in inconsistent alignment across all buffers. Align all buffers to at least a cache line to improve performance. This is especially useful when multiple DRBG instances are used, since it prevents false sharing of cache lines between the different instances. Signed-off-by: Sultan Alsawaf <[email protected]>
Like pin_user_pages_fast(), but with the internal-only FOLL_FAST_ONLY flag. This complements the get_user_pages*() API, which already has get_user_pages_fast_only(). Signed-off-by: Sultan Alsawaf <[email protected]>
There is no reason this refcount should be a signed int. Convert it to an unsigned int, thereby also making it less likely to ever overflow. Signed-off-by: Sultan Alsawaf <[email protected]>
Since crypto_devrandom_read_iter() is invoked directly by user tasks and is accessible by every task in the system, there are glaring priority inversions on crypto_reseed_rng_lock and crypto_default_rng_lock. Tasks of arbitrary scheduling priority access crypto_devrandom_read_iter(). When a low-priority task owns one of the mutex locks, higher-priority tasks waiting on that mutex lock are stalled until the low-priority task is done. Fix the priority inversions by converting the mutex locks into rt_mutex locks which have PI support. Signed-off-by: Sultan Alsawaf <[email protected]>
Please closely review the commits I authored in this PR, since they are new original works. |
Do you have any testing you can share in the PR header? |
I can't think of a great way to tabulate the test methodology for this, but what I tested was:
I printk'd all the edge cases to verify that they were hit and that they worked correctly, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worked with Sultan whilst he was creating these patches (and I may be the reason for some of the comments :-). I am completely happy with the code logic, but as I told him I'm going to have to trust him on the locking design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't pretend to have a full understanding of what is going on here, but from the commit messages and comments I think I get the gist of it. With that being said, LGTM 🥌
When the kernel is booted with fips=1, the RNG exposed to userspace is hijacked away from the CRNG and redirects to crypto_devrandom_read_iter(), which utilizes the DRBG. Notably, crypto_devrandom_read_iter() maintains just two global DRBG instances _for the entire system_, and the two instances serve separate request types: one instance for GRND_RANDOM requests (crypto_reseed_rng), and one instance for non-GRND_RANDOM requests (crypto_default_rng). So in essence, for requests of a single type, there is just one global RNG for all CPUs in the entire system, which scales _very_ poorly. To make matters worse, the temporary buffer used to ferry data between the DRBG and userspace is woefully small at only 256 bytes, which doesn't do a good job of maximizing throughput from the DRBG. This results in lost performance when userspace requests >256 bytes; it is observed that DRBG throughput improves by 70% on an i9-13900H when the buffer size is increased to 4096 bytes (one page). Going beyond the size of one page up to the DRBG maximum request limit of 65536 bytes produces diminishing returns of only 3% improved throughput in comparison. And going below the size of one page produces progressively less throughput at each power of 2: there's a 5% loss going from 4096 bytes to 2048 bytes and a 9% loss going from 2048 bytes to 1024 bytes. Thus, this implements per-CPU DRBG instances utilizing a page-sized buffer for each CPU to utilize the DRBG itself more effectively. On top of that, for non-GRND_RANDOM requests, the DRBG's operations now occur under a local lock that disables preemption on non-PREEMPT_RT kernels, which not only keeps each CPU's DRBG instance isolated from another, but also improves temporal cache locality while the DRBG actively generates a new string of random bytes. Prefaulting one user destination page at a time is also employed to prevent a DRBG instance from getting blocked on page faults, thereby maximizing the use of the DRBG so that the only bottleneck is the DRBG itself. Signed-off-by: Sultan Alsawaf <[email protected]>
d1bd048
to
472f628
Compare
FYI, I added a few more small comments and micro-optimized 2 lines of code. Going to merge it since it's not any different from what it was before. Thanks for the reviews 🙂 |
472f628
into
fips-9-compliant/5.14.0-570.18.1.el9_6
This adds the missing baseline FIPS changes ported over by @jallisonciq, in addition to my new FIPS RNG performance overhaul.