Skip to content

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Jan 7, 2024

Use GFNI even if not on AVX512.

Tested.

Mostly same speed as AVX512 - better than AVX2:

$ ./benchmark -avx512=false
Benchmarking 1 block(s) of 12 data (K) and 4 parity shards (M), each 873814 bytes using 16 threads. Total 13981024 bytes.

 * Encoded 224374 MiB in 10s. Speed: 22436.27 MiB/s (12+4:873814)
 * Repaired 211160 MiB in 10s. Speed: 21115.21 MiB/s (12+4:873814)
 
$ ./benchmark -avx512=true
Benchmarking 1 block(s) of 12 data (K) and 4 parity shards (M), each 873814 bytes using 16 threads. Total 13981024 bytes.

 * Encoded 296560 MiB in 10s. Speed: 29654.76 MiB/s (12+4:873814)
 * Repaired 212347 MiB in 10s. Speed: 21234.58 MiB/s (12+4:873814)

$ ./benchmark -avx512=false -avx2-gfni=false
Benchmarking 1 block(s) of 12 data (K) and 4 parity shards (M), each 873814 bytes using 16 threads. Total 13981024 bytes.

 * Encoded 140600 MiB in 10s. Speed: 14059.98 MiB/s (12+4:873814)
 * Repaired 147347 MiB in 10.001s. Speed: 14733.59 MiB/s (12+4:873814)
 
$ go build -tags=nopshufb
$ ./benchmark -avx512=false -avx2-gfni=false
Benchmarking 1 block(s) of 12 data (K) and 4 parity shards (M), each 873814 bytes using 16 threads. Total 13981024 bytes.

 * Encoded 41693 MiB in 10.003s. Speed: 4168.21 MiB/s (12+4:873814)
 * Repaired 37600 MiB in 10s. Speed: 3759.95 MiB/s (12+4:873814)
 

@klauspost
Copy link
Owner Author

Fails... Doesn't seem like any CI machines exposes GFNI.

@klauspost
Copy link
Owner Author

$ go test -no-avx512 -timeout=15m
Using SSE2,AVX2,SSSE3,AVX2+GFNI
PASS
ok      github.com/klauspost/reedsolomon        791.824s

@klauspost
Copy link
Owner Author

Also this is AVX and not AVX2

@klauspost klauspost changed the title Add pure AVX2 GFNI mode Add pure AVX GFNI mode Jan 8, 2024
@klauspost klauspost merged commit 5b85c72 into master Jan 9, 2024
@klauspost klauspost deleted the add-avx2-gfni branch January 9, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants