perf(simd): paired-byte SIMD search for memmem#55
Merged
Conversation
Implement frequency-based rare byte selection and paired-byte AVX2 search for dramatically improved substring matching. Algorithm: - Empirical byte frequency table (256 bytes) - Select two rarest bytes in needle - MemchrPair: search both bytes at correct offset simultaneously - Reduces false positives vs single-byte search Benchmarks (vs stdlib bytes.Index): - 4KB haystack, 64B needle: 19x faster - 16KB haystack, 64B needle: 52x faster - 64KB haystack, 64B needle: 45x faster - 1MB haystack, 64B needle: 39x faster - Short needle (7B): 10.5x faster New files: - simd/byte_frequencies.go: frequency table + SelectRareBytes - simd/byte_frequencies_test.go: comprehensive tests Closes #49
Benchmark ComparisonComparing Summary:
|
SWAR zero-detection formula can produce false positives when a byte equals 0x01 adjacent to a 0x00 byte due to borrow propagation during subtraction. This caused test failures on 386 architecture. Solution: verify each candidate position after SWAR detection before returning, while preserving the SWAR optimization for the common case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement frequency-based rare byte selection and paired-byte AVX2 search for dramatically improved substring matching (Issue #49).
Algorithm
SelectRareBytes(): identify two rarest bytes in needleMemchrPair: AVX2 SIMD search for both bytes at correct offset simultaneouslyBenchmarks (vs stdlib bytes.Index)
Files Changed
simd/byte_frequencies.go(new): frequency table + SelectRareBytessimd/byte_frequencies_test.go(new): comprehensive testssimd/memchr_amd64.s: AVX2 memchrPairAVX2 assemblysimd/memchr_amd64.go: MemchrPair wrappersimd/memchr_fallback.go: non-AMD64 fallbacksimd/memchr_generic_impl.go: SWAR generic implementationsimd/memmem.go: refactored to use paired-byte searchsimd/memchr_test.go: MemchrPair tests + fuzzTest Plan
go test ./simd/... -racepassesgolangci-lint run- 0 issuesCloses #49