perf(simd): paired-byte SIMD search for memmem by kolkov · Pull Request #55 · coregx/coregex

kolkov · 2026-01-04T17:06:07Z

Summary

Implement frequency-based rare byte selection and paired-byte AVX2 search for dramatically improved substring matching (Issue #49).

Algorithm

Empirical byte frequency table (256 bytes) ranking bytes by commonality
SelectRareBytes(): identify two rarest bytes in needle
MemchrPair: AVX2 SIMD search for both bytes at correct offset simultaneously
Dramatically reduces false positives vs single-byte search

Benchmarks (vs stdlib bytes.Index)

Haystack	Needle	Speedup
4KB	64B	19x
16KB	64B	52x
64KB	64B	45x
1MB	64B	39x
any	7B	10.5x

Files Changed

simd/byte_frequencies.go (new): frequency table + SelectRareBytes
simd/byte_frequencies_test.go (new): comprehensive tests
simd/memchr_amd64.s: AVX2 memchrPairAVX2 assembly
simd/memchr_amd64.go: MemchrPair wrapper
simd/memchr_fallback.go: non-AMD64 fallback
simd/memchr_generic_impl.go: SWAR generic implementation
simd/memmem.go: refactored to use paired-byte search
simd/memchr_test.go: MemchrPair tests + fuzz

Test Plan

go test ./simd/... -race passes
golangci-lint run - 0 issues
Coverage: 86.7%
Pre-release check passed

Closes #49

Implement frequency-based rare byte selection and paired-byte AVX2 search for dramatically improved substring matching. Algorithm: - Empirical byte frequency table (256 bytes) - Select two rarest bytes in needle - MemchrPair: search both bytes at correct offset simultaneously - Reduces false positives vs single-byte search Benchmarks (vs stdlib bytes.Index): - 4KB haystack, 64B needle: 19x faster - 16KB haystack, 64B needle: 52x faster - 64KB haystack, 64B needle: 45x faster - 1MB haystack, 64B needle: 39x faster - Short needle (7B): 10.5x faster New files: - simd/byte_frequencies.go: frequency table + SelectRareBytes - simd/byte_frequencies_test.go: comprehensive tests Closes #49

github-actions · 2026-01-04T17:09:04Z

Benchmark Comparison

Comparing main → PR #55

Summary: geomean 244.7n 245.0n +0.12%

⚠️ Potential regressions detected:

Accelerate/memchr1-4       109.5n ± ∞ ¹   109.8n ± ∞ ¹  +0.27% (p=0.032 n=5)
geomean                    244.7n         245.0n        +0.12%
geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean              32.29n         32.47n        +0.57%
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
Find/hello-4                                            710.3n ± ∞ ¹   723.3n ± ∞ ¹   +1.83% (p=0.008 n=5)
Find/foo|bar|baz-4                                      72.29n ± ∞ ¹   75.88n ± ∞ ¹   +4.97% (p=0.008 n=5)
IsMatch/literal-4                                       50.47n ± ∞ ¹   63.30n ± ∞ ¹  +25.42% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

SWAR zero-detection formula can produce false positives when a byte equals 0x01 adjacent to a 0x00 byte due to borrow propagation during subtraction. This caused test failures on 386 architecture. Solution: verify each candidate position after SWAR detection before returning, while preserving the SWAR optimization for the common case.

kolkov merged commit 54f5d8a into main Jan 4, 2026
15 checks passed

kolkov deleted the feature/paired-byte-simd branch January 4, 2026 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(simd): paired-byte SIMD search for memmem#55

perf(simd): paired-byte SIMD search for memmem#55
kolkov merged 2 commits intomainfrom
feature/paired-byte-simd

kolkov commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kolkov commented Jan 4, 2026

Summary

Algorithm

Benchmarks (vs stdlib bytes.Index)

Files Changed

Test Plan

Uh oh!

github-actions bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Jan 4, 2026 •

edited

Loading