-
Notifications
You must be signed in to change notification settings - Fork 139
Commit 1c903da
committed
Dispatch between C, s2n-bignum and OpenSSL Keccak implementations
Previously, a static dispatch would choose between the C implementation
of Keccak-F1600 or the assembly implementations (one scalar, one SIMD)
provided by OpenSSL. The C<->ASM interface was Keccak1600_Absorb and
Keccak1600_Squeeze.
This commit lowers the C<->ASM interface to the core Keccak permutation
itself; the Absorb/Squeeze assembly wrappers in keccak1600-armv8.pl
are removed accordingly.
Moroever the commit integrates the Keccak-F1600 implementations from
s2n-bignum into the build and replaces the above static dispatch by
a runtime dispatch based on CPU detection / CPU capabilities:
1. If ASM is disabled, we use the C implementation.
2. If ASM is enabled:
- For Neoverse N1, V1, V2, we use scalar Keccak assembly from s2n-bignum,
leveraging lazy rotations from https://eprint.iacr.org/2022/1243.
- For Arm-based Apple CPUs, we use Neon Keccak assembly from s2n-bignum,
leveraging the AArch64 SHA3 extension.
- Otherwise, fall back to scalar Keccak implementation from OpenSSL,
not using lazy rotations.
Lazy rotations improve performance by up to 10% on CPUs with free
Barrel shifting, which includes Neoverse N1, V1, and V2. Not all
CPUs have free Barrel shifting (e.g. Apple M1 or Cortex-A72), so we
don't use it by default.
Neoverse V1 and V2 do support SHA3 instructions, but they are only
implemented on 1/4 of Neon units, and are thus slower than a scalar
implementation.
Finally, since the Neon Keccak assembly from s2n-bignum is faster than
the Neon Keccak assembly from the OpenSSL implementation, the latter
is removed from keccak1600-armv8.pl, leaving only the scalar assembly
implementation for the core Keccak permutation.
Performance impact
------------------
* Apple M1
| Algorithm | Size | Main | New | Gain | % |
|:----------|:------|------------|------------|------------|-------:|
| SHA3-224 | 16b | 71.5 MB/s | 88.3 MB/s | +16.8 MB/s | +23.5% |
| | 256b | 584.5 MB/s | 754.7 MB/s | +170.2 MB/s| +29.1% |
| | 1350b | 633.8 MB/s | 815.2 MB/s | +181.4 MB/s| +28.6% |
| | 8kb | 694.4 MB/s | 872.4 MB/s | +178.0 MB/s| +25.6% |
| | 16kb | 696.9 MB/s | 864.8 MB/s | +167.9 MB/s| +24.1% |
| SHA3-256 | 16b | 71.6 MB/s | 88.6 MB/s | +17.0 MB/s | +23.7% |
| | 256b | 600.8 MB/s | 759.0 MB/s | +158.2 MB/s| +26.3% |
| | 1350b | 638.8 MB/s | 817.5 MB/s | +178.7 MB/s| +28.0% |
| | 8kb | 652.3 MB/s | 820.5 MB/s | +168.2 MB/s| +25.8% |
| | 16kb | 658.9 MB/s | 823.8 MB/s | +164.9 MB/s| +25.0% |
| SHA3-384 | 16b | 71.9 MB/s | 86.8 MB/s | +14.9 MB/s | +20.7% |
| | 256b | 402.3 MB/s | 505.4 MB/s | +103.1 MB/s| +25.6% |
| | 1350b | 493.1 MB/s | 636.0 MB/s | +142.9 MB/s| +29.0% |
| | 8kb | 507.3 MB/s | 639.7 MB/s | +132.4 MB/s| +26.1% |
| | 16kb | 507.2 MB/s | 626.2 MB/s | +119.0 MB/s| +23.5% |
| SHA3-512 | 16b | 70.6 MB/s | 89.2 MB/s | +18.6 MB/s | +26.3% |
| | 256b | 305.7 MB/s | 390.8 MB/s | +85.1 MB/s | +27.8% |
| | 1350b | 347.2 MB/s | 436.7 MB/s | +89.5 MB/s | +25.8% |
| | 8kb | 355.0 MB/s | 446.3 MB/s | +91.3 MB/s | +25.7% |
| | 16kb | 356.1 MB/s | 445.7 MB/s | +89.6 MB/s | +25.2% |
| SHAKE-128 | 16b | 68.8 MB/s | 87.4 MB/s | +18.6 MB/s | +27.0% |
| | 256b | 572.2 MB/s | 747.5 MB/s | +175.3 MB/s| +30.6% |
| | 1350b | 780.8 MB/s | 1016.4 MB/s| +235.6 MB/s| +30.2% |
| | 8kb | 932.8 MB/s | 1215.4 MB/s| +282.6 MB/s| +30.3% |
| | 16kb | 932.4 MB/s | 1215.9 MB/s| +283.5 MB/s| +30.4% |
| SHAKE-256 | 16b | 69.0 MB/s | 87.6 MB/s | +18.6 MB/s | +27.0% |
| | 256b | 574.7 MB/s | 750.1 MB/s | +175.4 MB/s| +30.5% |
| | 1350b | 629.4 MB/s | 817.0 MB/s | +187.6 MB/s| +29.8% |
| | 8kb | 652.3 MB/s | 820.5 MB/s | +168.2 MB/s| +25.8% |
| | 16kb | 658.9 MB/s | 823.8 MB/s | +164.9 MB/s| +25.0% |
* Neoverse-V2
| Algorithm | Size | Main | New | Gain | % |
|:----------|:------|------------|------------|------------|------:|
| SHA3-224 | 16b | 53.4 MB/s | 56.9 MB/s | +3.5 MB/s | +6.7% |
| | 256b | 449.9 MB/s | 487.0 MB/s | +37.1 MB/s | +8.2% |
| | 1350b | 500.0 MB/s | 541.5 MB/s | +41.5 MB/s | +8.3% |
| | 8kb | 537.9 MB/s | 585.3 MB/s | +47.4 MB/s | +8.8% |
| | 16kb | 530.7 MB/s | 577.5 MB/s | +46.8 MB/s | +8.8% |
| SHA3-256 | 16b | 53.5 MB/s | 57.2 MB/s | +3.7 MB/s | +7.0% |
| | 256b | 451.6 MB/s | 488.1 MB/s | +36.5 MB/s | +8.1% |
| | 1350b | 500.1 MB/s | 542.0 MB/s | +41.9 MB/s | +8.4% |
| | 8kb | 503.0 MB/s | 546.9 MB/s | +43.9 MB/s | +8.7% |
| | 16kb | 500.2 MB/s | 544.9 MB/s | +44.7 MB/s | +8.9% |
| SHA3-384 | 16b | 53.8 MB/s | 57.7 MB/s | +3.9 MB/s | +7.2% |
| | 256b | 306.9 MB/s | 333.3 MB/s | +26.4 MB/s | +8.6% |
| | 1350b | 386.6 MB/s | 420.5 MB/s | +33.9 MB/s | +8.8% |
| | 8kb | 389.9 MB/s | 424.5 MB/s | +34.6 MB/s | +8.9% |
| | 16kb | 384.9 MB/s | 420.1 MB/s | +35.2 MB/s | +9.1% |
| SHA3-512 | 16b | 53.4 MB/s | 57.8 MB/s | +4.4 MB/s | +8.3% |
| | 256b | 233.5 MB/s | 254.0 MB/s | +20.5 MB/s | +8.8% |
| | 1350b | 266.7 MB/s | 290.2 MB/s | +23.5 MB/s | +8.8% |
| | 8kb | 271.9 MB/s | 295.8 MB/s | +23.9 MB/s | +8.8% |
| | 16kb | 268.7 MB/s | 292.7 MB/s | +24.0 MB/s | +8.9% |
| SHAKE-128 | 16b | 49.6 MB/s | 53.1 MB/s | +3.5 MB/s | +7.0% |
| | 256b | 432.9 MB/s | 468.0 MB/s | +35.1 MB/s | +8.1% |
| | 1350b | 547.5 MB/s | 592.5 MB/s | +45.0 MB/s | +8.2% |
| | 8kb | 621.6 MB/s | 676.1 MB/s | +54.5 MB/s | +8.8% |
| | 16kb | 613.4 MB/s | 667.7 MB/s | +54.3 MB/s | +8.9% |
| SHAKE-256 | 16b | 49.7 MB/s | 53.2 MB/s | +3.5 MB/s | +7.2% |
| | 256b | 432.9 MB/s | 469.1 MB/s | +36.2 MB/s | +8.4% |
| | 1350b | 494.6 MB/s | 537.9 MB/s | +43.3 MB/s | +8.8% |
| | 8kb | 502.3 MB/s | 546.6 MB/s | +44.3 MB/s | +8.8% |
| | 16kb | 499.6 MB/s | 545.2 MB/s | +45.6 MB/s | +9.1% |
* Neoverse-N1
| Algorithm | Size | Main | New | Gain | % |
|:----------|:------|------------|------------|------------|-------:|
| SHA3-224 | 16b | 32.7 MB/s | 36.5 MB/s | +3.8 MB/s | +11.7% |
| | 256b | 277.2 MB/s | 311.2 MB/s | +34.0 MB/s | +12.3% |
| | 1350b | 309.5 MB/s | 347.2 MB/s | +37.7 MB/s | +12.2% |
| | 8kb | 334.4 MB/s | 375.8 MB/s | +41.4 MB/s | +12.4% |
| | 16kb | 331.1 MB/s | 372.5 MB/s | +41.4 MB/s | +12.5% |
| SHA3-256 | 16b | 33.0 MB/s | 36.8 MB/s | +3.8 MB/s | +11.5% |
| | 256b | 279.4 MB/s | 312.5 MB/s | +33.1 MB/s | +11.9% |
| | 1350b | 310.0 MB/s | 348.3 MB/s | +38.3 MB/s | +12.4% |
| | 8kb | 312.8 MB/s | 352.1 MB/s | +39.3 MB/s | +12.6% |
| | 16kb | 312.4 MB/s | 353.0 MB/s | +40.6 MB/s | +13.0% |
| SHA3-384 | 16b | 33.1 MB/s | 36.9 MB/s | +3.8 MB/s | +11.5% |
| | 256b | 190.7 MB/s | 214.1 MB/s | +23.4 MB/s | +12.3% |
| | 1350b | 240.2 MB/s | 269.9 MB/s | +29.7 MB/s | +12.4% |
| | 8kb | 242.7 MB/s | 273.2 MB/s | +30.5 MB/s | +12.6% |
| | 16kb | 240.4 MB/s | 271.7 MB/s | +31.3 MB/s | +13.0% |
| SHA3-512 | 16b | 33.1 MB/s | 36.9 MB/s | +3.8 MB/s | +11.6% |
| | 256b | 145.1 MB/s | 162.8 MB/s | +17.7 MB/s | +12.2% |
| | 1350b | 165.7 MB/s | 186.2 MB/s | +20.5 MB/s | +12.3% |
| | 8kb | 169.1 MB/s | 190.0 MB/s | +20.9 MB/s | +12.4% |
| | 16kb | 167.5 MB/s | 189.2 MB/s | +21.7 MB/s | +13.0% |
| SHAKE-128 | 16b | 30.3 MB/s | 33.6 MB/s | +3.3 MB/s | +10.9% |
| | 256b | 263.7 MB/s | 293.2 MB/s | +29.5 MB/s | +11.2% |
| | 1350b | 338.2 MB/s | 379.4 MB/s | +41.2 MB/s | +12.2% |
| | 8kb | 387.2 MB/s | 435.4 MB/s | +48.2 MB/s | +12.5% |
| | 16kb | 383.6 MB/s | 432.9 MB/s | +49.3 MB/s | +12.9% |
| SHAKE-256 | 16b | 30.5 MB/s | 33.8 MB/s | +3.3 MB/s | +10.9% |
| | 256b | 264.9 MB/s | 294.5 MB/s | +29.6 MB/s | +11.2% |
| | 1350b | 306.5 MB/s | 344.1 MB/s | +37.6 MB/s | +12.3% |
| | 8kb | 312.0 MB/s | 351.5 MB/s | +39.5 MB/s | +12.7% |
| | 16kb | 312.1 MB/s | 352.7 MB/s | +40.6 MB/s | +13.0% |
* Neoverse-V1
| Algorithm | Size | Main | New | Gain | % |
|:----------|:------|------------|------------|------------|-------:|
| SHA3-224 | 16b | 45.6 MB/s | 49.3 MB/s | +3.7 MB/s | +8.2% |
| | 256b | 382.6 MB/s | 419.4 MB/s | +36.8 MB/s | +9.6% |
| | 1350b | 422.5 MB/s | 464.0 MB/s | +41.5 MB/s | +9.8% |
| | 8kb | 454.9 MB/s | 500.5 MB/s | +45.6 MB/s | +10.0% |
| | 16kb | 449.2 MB/s | 495.5 MB/s | +46.3 MB/s | +10.3% |
| SHA3-256 | 16b | 45.7 MB/s | 49.5 MB/s | +3.8 MB/s | +8.5% |
| | 256b | 383.2 MB/s | 420.8 MB/s | +37.6 MB/s | +9.8% |
| | 1350b | 422.8 MB/s | 464.4 MB/s | +41.6 MB/s | +9.8% |
| | 8kb | 425.8 MB/s | 467.7 MB/s | +41.9 MB/s | +9.8% |
| | 16kb | 424.0 MB/s | 468.1 MB/s | +44.1 MB/s | +10.4% |
| SHA3-384 | 16b | 45.7 MB/s | 49.7 MB/s | +4.0 MB/s | +8.7% |
| | 256b | 261.3 MB/s | 284.5 MB/s | +23.2 MB/s | +8.9% |
| | 1350b | 327.8 MB/s | 359.6 MB/s | +31.8 MB/s | +9.7% |
| | 8kb | 330.5 MB/s | 362.7 MB/s | +32.2 MB/s | +9.8% |
| | 16kb | 326.3 MB/s | 360.6 MB/s | +34.3 MB/s | +10.5% |
| SHA3-512 | 16b | 45.7 MB/s | 49.5 MB/s | +3.8 MB/s | +8.3% |
| | 256b | 198.4 MB/s | 216.7 MB/s | +18.3 MB/s | +9.2% |
| | 1350b | 226.1 MB/s | 247.5 MB/s | +21.4 MB/s | +9.5% |
| | 8kb | 230.2 MB/s | 252.0 MB/s | +21.8 MB/s | +9.4% |
| | 16kb | 227.7 MB/s | 250.3 MB/s | +22.6 MB/s | +9.9% |
| SHAKE-128 | 16b | 42.1 MB/s | 45.8 MB/s | +3.7 MB/s | +8.9% |
| | 256b | 366.4 MB/s | 402.3 MB/s | +35.9 MB/s | +9.8% |
| | 1350b | 463.5 MB/s | 508.8 MB/s | +45.3 MB/s | +9.8% |
| | 8kb | 525.7 MB/s | 580.0 MB/s | +54.3 MB/s | +10.3% |
| | 16kb | 519.4 MB/s | 574.4 MB/s | +55.0 MB/s | +10.6% |
| SHAKE-256 | 16b | 42.3 MB/s | 46.0 MB/s | +3.7 MB/s | +8.8% |
| | 256b | 367.6 MB/s | 404.2 MB/s | +36.6 MB/s | +9.9% |
| | 1350b | 418.8 MB/s | 459.9 MB/s | +41.1 MB/s | +9.8% |
| | 8kb | 425.1 MB/s | 466.9 MB/s | +41.8 MB/s | +9.8% |
| | 16kb | 423.7 MB/s | 467.4 MB/s | +43.7 MB/s | +10.3% |
* Cortex-A72
| Algorithm | Size | Main | New | Gain | % |
|:----------|:------|------------|------------|------------|------:|
| SHA3-224 | 16b | 19.9 MB/s | 19.6 MB/s | -0.3 MB/s | -1.2% |
| | 256b | 169.9 MB/s | 168.1 MB/s | -1.8 MB/s | -1.0% |
| | 1350b | 195.7 MB/s | 189.3 MB/s | -6.4 MB/s | -3.3% |
| | 8kb | 211.7 MB/s | 204.8 MB/s | -6.9 MB/s | -3.2% |
| | 16kb | 212.2 MB/s | 205.3 MB/s | -6.9 MB/s | -3.2% |
| SHA3-256 | 16b | 19.6 MB/s | 19.7 MB/s | +0.1 MB/s | +0.6% |
| | 256b | 168.9 MB/s | 168.7 MB/s | -0.2 MB/s | -0.1% |
| | 1350b | 195.2 MB/s | 189.0 MB/s | -6.2 MB/s | -3.2% |
| | 8kb | 198.6 MB/s | 191.8 MB/s | -6.8 MB/s | -3.4% |
| | 16kb | 200.7 MB/s | 193.8 MB/s | -6.9 MB/s | -3.4% |
| SHA3-384 | 16b | 20.0 MB/s | 19.8 MB/s | -0.2 MB/s | -0.9% |
| | 256b | 118.3 MB/s | 115.6 MB/s | -2.7 MB/s | -2.3% |
| | 1350b | 151.6 MB/s | 146.8 MB/s | -4.8 MB/s | -3.2% |
| | 8kb | 154.2 MB/s | 148.9 MB/s | -5.3 MB/s | -3.4% |
| | 16kb | 154.5 MB/s | 149.1 MB/s | -5.4 MB/s | -3.5% |
| SHA3-512 | 16b | 20.0 MB/s | 19.7 MB/s | -0.3 MB/s | -1.5% |
| | 256b | 90.2 MB/s | 87.8 MB/s | -2.4 MB/s | -2.6% |
| | 1350b | 104.9 MB/s | 100.6 MB/s | -4.3 MB/s | -4.1% |
| | 8kb | 107.4 MB/s | 102.7 MB/s | -4.7 MB/s | -4.3% |
| | 16kb | 107.5 MB/s | 102.9 MB/s | -4.6 MB/s | -4.3% |
| SHAKE-128 | 16b | 16.8 MB/s | 17.7 MB/s | +0.9 MB/s | +5.0% |
| | 256b | 157.2 MB/s | 159.2 MB/s | +2.0 MB/s | +1.3% |
| | 1350b | 211.4 MB/s | 206.0 MB/s | -5.4 MB/s | -2.6% |
| | 8kb | 245.1 MB/s | 236.1 MB/s | -9.0 MB/s | -3.7% |
| | 16kb | 245.9 MB/s | 237.6 MB/s | -8.3 MB/s | -3.4% |
| SHAKE-256 | 16b | 17.6 MB/s | 17.8 MB/s | +0.2 MB/s | +1.3% |
| | 256b | 158.9 MB/s | 158.1 MB/s | -0.8 MB/s | -0.5% |
| | 1350b | 192.5 MB/s | 186.9 MB/s | -5.6 MB/s | -3.0% |
| | 8kb | 198.0 MB/s | 191.1 MB/s | -6.9 MB/s | -3.5% |
| | 16kb | 200.4 MB/s | 193.2 MB/s | -7.2 MB/s | -3.6% |
Signed-off-by: Hanno Becker <[email protected]>1 parent fd2f34f commit 1c903daCopy full SHA for 1c903da
File tree
Expand file treeCollapse file tree
3 files changed
+119
-490
lines changedFilter options
- crypto/fipsmodule
- sha
- asm
Expand file treeCollapse file tree
3 files changed
+119
-490
lines changedcrypto/fipsmodule/CMakeLists.txt
Copy file name to clipboardExpand all lines: crypto/fipsmodule/CMakeLists.txt+31Lines changed: 31 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
291 | 291 |
| |
292 | 292 |
| |
293 | 293 |
| |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
294 | 325 |
| |
295 | 326 |
| |
296 | 327 |
| |
|
0 commit comments