Commit 17650cc
authored
Add Optimized and HOL Light verified AVX2 Keccak x4 (#3020)
### Issues:
Import AVX2 Optimized and HOL Light verified 4x Keccak permutation
awslabs/s2n-bignum#354
NOTE:: Once awslabs/s2n-bignum#354 is merged,
the assembly files would be imported directly with the importer script.
### Description of changes:
This PR introduces an optimized AVX2 implementation of the
Keccak-f[1600] x4 permutation, formally verified using HOL Light. This
batched Keccak implementation processes four independent Keccak
permutations in parallel using AVX2 SIMD instructions, significantly
accelerating the core hash operations underlying ML-KEM (FIPS 203) and
ML-DSA (FIPS 204).
The 4-way parallel Keccak permutation is a critical building block for
lattice-based cryptographic schemes, as it is heavily used in:
- ML-KEM: Matrix/vector sampling, seed expansion, and hash operations
during keygen, encapsulation, and decapsulation
- ML-DSA: Key generation, signing (rejection sampling), and verification
<h4>Performance Results</h4>
<p>The optimization delivers substantial throughput improvements across
all tested EC2 instance types:</p>
<p><strong>Average Speedups by Algorithm Family:</strong></p>
Algorithm | c7i | c7a | c6i | c6a
-- | -- | -- | -- | --
ML-KEM-512 | +29.2% | +38.7% | +41.4% | +37.6%
ML-KEM-768 | +29.3% | +37.6% | +37.4% | +37.4%
ML-KEM-1024 | +34.8% | +46.7% | +51.0% | +48.4%
MLDSA44 | +16.6% | +17.9% | +23.0% | +21.1%
MLDSA65 | +19.6% | +18.4% | +23.9% | +20.8%
MLDSA87 | +28.5% | +28.0% | +34.7% | +31.9%
<p><strong>Notable highlights:</strong></p>
<ul>
<li>Peak speedup of <strong>+59.0%</strong> for ML-KEM-1024 keygen on
c6i</li>
<li>ML-KEM-1024 benefits the most across all platforms (up to
<strong>+52.7%</strong> on c7a), as larger parameter sets invoke more
Keccak calls</li>
<li>MLDSA signing shows modest gains (+2–13%) since its runtime is
dominated by rejection sampling rather than Keccak permutation
throughput</li>
<li>Improvements are consistent across both Intel and AMD platforms and
across both current (Gen 7) and previous generation (Gen 6)
instances</li>
</ul>
### Call-outs:
- Reviewers should pay attention to the integration points where the new
Keccak x4 is wired into the ML-KEM and ML-DSA call paths
### Testing:
- All existing ML-KEM and ML-DSA KAT (Known Answer Tests) pass,
confirming functional correctness
`./crypto/crypto_test `
- Performance benchmarked using ./tool/bssl speed on four EC2 instance
types (c7i, c7a, c6i, c6a) to validate throughput improvements
`tool/bssl speed -filter "ML-KEM"`
`tool/bssl speed -filter "MLDSA"`
### More Performance Data
<h4>EC2 c7i</h4>
Algorithm | Operation | Original (ops/sec) | New (ops/sec) | Speedup
-- | -- | -- | -- | --
ML-KEM-512 | keygen | 102,039.0 | 137,883.4 | +35.1%
ML-KEM-512 | encaps | 92,432.1 | 118,961.3 | +28.7%
ML-KEM-512 | decaps | 77,155.5 | 95,523.5 | +23.8%
ML-KEM-768 | keygen | 65,240.3 | 86,148.5 | +32.1%
ML-KEM-768 | encaps | 60,583.8 | 79,416.7 | +31.1%
ML-KEM-768 | decaps | 51,275.5 | 64,007.9 | +24.8%
ML-KEM-1024 | keygen | 43,752.6 | 62,079.1 | +41.9%
ML-KEM-1024 | encaps | 40,528.9 | 54,745.9 | +35.1%
ML-KEM-1024 | decaps | 35,182.9 | 44,833.4 | +27.4%
MLDSA44 | keygen | 19,594.8 | 23,784.0 | +21.4%
MLDSA44 | signing | 4,776.1 | 5,105.4 | +6.9%
MLDSA44 | verify | 18,485.0 | 22,439.7 | +21.4%
MLDSA65 | keygen | 10,078.2 | 12,485.9 | +23.9%
MLDSA65 | signing | 3,030.3 | 3,263.0 | +7.7%
MLDSA65 | verify | 11,629.3 | 14,807.7 | +27.3%
MLDSA87 | keygen | 7,177.4 | 9,908.2 | +38.0%
MLDSA87 | signing | 2,534.6 | 2,776.0 | +9.5%
MLDSA87 | verify | 7,049.3 | 9,737.1 | +38.1%
<h4>EC2 c7a</h4>
Algorithm | Operation | Original (ops/sec) | New (ops/sec) | Speedup
-- | -- | -- | -- | --
ML-KEM-512 | keygen | 94,563.7 | 137,392.5 | +45.3%
ML-KEM-512 | encaps | 85,020.5 | 118,473.0 | +39.3%
ML-KEM-512 | decaps | 71,284.2 | 93,645.4 | +31.4%
ML-KEM-768 | keygen | 56,037.5 | 79,772.7 | +42.4%
ML-KEM-768 | encaps | 52,744.2 | 73,353.1 | +39.1%
ML-KEM-768 | decaps | 44,832.4 | 58,874.9 | +31.3%
ML-KEM-1024 | keygen | 37,007.2 | 56,511.5 | +52.7%
ML-KEM-1024 | encaps | 34,843.2 | 51,659.7 | +48.3%
ML-KEM-1024 | decaps | 30,052.9 | 41,833.0 | +39.2%
MLDSA44 | keygen | 17,087.6 | 21,781.5 | +27.5%
MLDSA44 | signing | 3,833.2 | 3,941.9 | +2.8%
MLDSA44 | verify | 15,055.9 | 18,594.2 | +23.5%
MLDSA65 | keygen | 9,295.8 | 11,665.2 | +25.5%
MLDSA65 | signing | 2,418.1 | 2,468.3 | +2.1%
MLDSA65 | verify | 9,658.5 | 12,321.2 | +27.6%
MLDSA87 | keygen | 6,458.0 | 9,079.5 | +40.6%
MLDSA87 | signing | 2,021.0 | 2,147.5 | +6.3%
MLDSA87 | verify | 6,094.7 | 8,355.6 | +37.1%
<h4>EC2 c6i</h4>
Algorithm | Operation | Original (ops/sec) | New (ops/sec) | Speedup
-- | -- | -- | -- | --
ML-KEM-512 | keygen | 74,243.5 | 110,577.2 | +48.9%
ML-KEM-512 | encaps | 66,855.4 | 94,949.4 | +42.0%
ML-KEM-512 | decaps | 56,641.7 | 75,435.7 | +33.2%
ML-KEM-768 | keygen | 44,861.7 | 63,758.2 | +42.1%
ML-KEM-768 | encaps | 42,938.1 | 59,149.7 | +37.7%
ML-KEM-768 | decaps | 35,953.7 | 47,646.6 | +32.5%
ML-KEM-1024 | keygen | 30,333.7 | 48,230.3 | +59.0%
ML-KEM-1024 | encaps | 28,685.5 | 43,577.8 | +51.9%
ML-KEM-1024 | decaps | 23,941.1 | 33,996.0 | +42.0%
MLDSA44 | keygen | 14,708.0 | 19,526.7 | +32.8%
MLDSA44 | signing | 3,488.7 | 3,693.8 | +5.9%
MLDSA44 | verify | 13,581.8 | 17,702.9 | +30.3%
MLDSA65 | keygen | 7,868.8 | 10,223.9 | +29.9%
MLDSA65 | signing | 2,153.5 | 2,309.9 | +7.3%
MLDSA65 | verify | 8,542.6 | 11,490.7 | +34.5%
MLDSA87 | keygen | 5,428.8 | 8,082.0 | +48.9%
MLDSA87 | signing | 1,819.2 | 1,973.5 | +8.5%
MLDSA87 | verify | 5,258.6 | 7,708.2 | +46.6%
<h4>EC2 c6a</h4>
Algorithm | Operation | Original (ops/sec) | New (ops/sec) | Speedup
-- | -- | -- | -- | --
ML-KEM-512 | keygen | 94,817.9 | 138,020.5 | +45.6%
ML-KEM-512 | encaps | 87,240.8 | 120,129.4 | +37.7%
ML-KEM-512 | decaps | 72,457.8 | 93,790.5 | +29.4%
ML-KEM-768 | keygen | 60,954.7 | 87,137.1 | +42.9%
ML-KEM-768 | encaps | 57,065.8 | 79,197.9 | +38.8%
ML-KEM-768 | decaps | 47,498.6 | 62,029.9 | +30.6%
ML-KEM-1024 | keygen | 41,115.4 | 64,320.3 | +56.4%
ML-KEM-1024 | encaps | 38,250.1 | 57,234.5 | +49.6%
ML-KEM-1024 | decaps | 32,493.6 | 45,190.5 | +39.1%
MLDSA44 | keygen | 16,540.2 | 21,594.4 | +30.6%
MLDSA44 | signing | 3,670.8 | 3,916.2 | +6.7%
MLDSA44 | verify | 14,447.5 | 18,216.4 | +26.1%
MLDSA65 | keygen | 8,977.5 | 11,496.0 | +28.0%
MLDSA65 | signing | 2,346.1 | 2,422.6 | +3.3%
MLDSA65 | verify | 9,377.8 | 12,294.7 | +31.1%
MLDSA87 | keygen | 6,224.4 | 8,939.6 | +43.6%
MLDSA87 | signing | 1,961.1 | 2,208.0 | +12.6%
MLDSA87 | verify | 5,862.3 | 8,183.7 | +39.6%
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.1 parent 666d5e3 commit 17650cc
File tree
65 files changed
+11133
-1513
lines changed- crypto
- fipsmodule
- sha
- third_party/s2n-bignum
- s2n-bignum-imported
- arm
- curve25519
- fastmul
- p256
- p384
- sha3
- sm2
- tutorial
- doc
- include
- x86_att
- curve25519
- mldsa
- mlkem
- p256
- p384
- sha3
- sm2
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
65 files changed
+11133
-1513
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
253 | 256 | | |
254 | 257 | | |
255 | 258 | | |
| |||
264 | 267 | | |
265 | 268 | | |
266 | 269 | | |
267 | | - | |
268 | 270 | | |
269 | 271 | | |
270 | 272 | | |
| |||
303 | 305 | | |
304 | 306 | | |
305 | 307 | | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
| 308 | + | |
| 309 | + | |
314 | 310 | | |
315 | 311 | | |
316 | 312 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
36 | 48 | | |
37 | 49 | | |
38 | 50 | | |
| |||
315 | 327 | | |
316 | 328 | | |
317 | 329 | | |
318 | | - | |
| 330 | + | |
319 | 331 | | |
320 | 332 | | |
321 | 333 | | |
| |||
366 | 378 | | |
367 | 379 | | |
368 | 380 | | |
| 381 | + | |
369 | 382 | | |
370 | 383 | | |
371 | 384 | | |
| |||
443 | 456 | | |
444 | 457 | | |
445 | 458 | | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
446 | 467 | | |
447 | 468 | | |
448 | 469 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
| 89 | + | |
88 | 90 | | |
89 | 91 | | |
90 | 92 | | |
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
94 | | - | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
122 | 123 | | |
123 | 124 | | |
124 | 125 | | |
| 126 | + | |
125 | 127 | | |
126 | 128 | | |
127 | 129 | | |
128 | 130 | | |
| 131 | + | |
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
| |||
138 | 141 | | |
139 | 142 | | |
140 | 143 | | |
141 | | - | |
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
| |||
156 | 158 | | |
157 | 159 | | |
158 | 160 | | |
| 161 | + | |
| 162 | + | |
159 | 163 | | |
160 | 164 | | |
161 | 165 | | |
| |||
297 | 301 | | |
298 | 302 | | |
299 | 303 | | |
| 304 | + | |
300 | 305 | | |
301 | 306 | | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
302 | 322 | | |
303 | 323 | | |
304 | 324 | | |
| |||
342 | 362 | | |
343 | 363 | | |
344 | 364 | | |
| 365 | + | |
345 | 366 | | |
346 | 367 | | |
347 | 368 | | |
348 | 369 | | |
349 | 370 | | |
350 | 371 | | |
351 | | - | |
352 | | - | |
353 | 372 | | |
354 | 373 | | |
355 | 374 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1288 | 1288 | | |
1289 | 1289 | | |
1290 | 1290 | | |
| 1291 | + | |
| 1292 | + | |
1291 | 1293 | | |
1292 | 1294 | | |
1293 | 1295 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
15 | | - | |
16 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| |||
358 | 359 | | |
359 | 360 | | |
360 | 361 | | |
| 362 | + | |
361 | 363 | | |
362 | 364 | | |
363 | 365 | | |
| |||
408 | 410 | | |
409 | 411 | | |
410 | 412 | | |
411 | | - | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
412 | 416 | | |
413 | 417 | | |
414 | 418 | | |
| |||
517 | 521 | | |
518 | 522 | | |
519 | 523 | | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
520 | 529 | | |
521 | 530 | | |
522 | 531 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
0 commit comments