perf: optimize strategy selection and Teddy 2-byte fingerprint by kolkov · Pull Request #61 · coregx/coregex

kolkov · 2026-01-06T12:18:21Z

Summary

Implement Teddy 2-byte fingerprint (reduces false positives by ~90%)
Reorder strategy selection to prioritize DigitPrefilter for digit-lead patterns
Add isDigitLeadPattern() helper for pattern classification

Performance Impact

Pattern	Before	After	Change
literal_alt	31ms	8ms	+4x faster
version	8.2ms	2ms	+4x faster
IP	3.9ms	5.5ms	-43% (trade-off)

Trade-off Analysis

The IP pattern regression is an acceptable trade-off:

We remain 2.2x faster than Rust regex on IP patterns (Rust: ~12ms)
The gains on literal_alt and version patterns significantly outweigh the IP regression
See perf: Research IP pattern optimization strategies #62 for future IP optimization research

Technical Details

Teddy 2-byte Fingerprint

Changed default from 1-byte to 2-byte fingerprint in prefilter/teddy.go:

1-byte: ~25% false positive rate on typical text
2-byte: <0.5% false positive rate

Strategy Reorder

Moved DigitPrefilter check before tiny NFA fallback in meta/strategy.go:

Ensures digit-lead patterns use specialized prefilter
Prevents single-byte inner literals (like .) for digit patterns

Test Plan

All existing tests pass
Pre-release check passes
Benchmarks validated locally
Trade-off documented in perf: Research IP pattern optimization strategies #62

Phase 1: Version pattern optimization - Move DigitPrefilter check before tiny NFA fallback (line 776) - Reject single-byte inner literals for digit-lead patterns - Patterns like \d+\.\d+\.\d+ now use DigitPrefilter instead of ReverseInner - Expected improvement: version pattern 12x -> 2x slower vs Rust Phase 2: Teddy 2-byte fingerprint - Change default fingerprint length from 1 to 2 bytes - Implement teddySlimSSSE3_2 assembly function (~150 LOC) - Reduces false positives by ~90% (from ~25% to <0.5%) - Expected improvement: literal_alt pattern 39x -> 5x slower vs Rust Files modified: - meta/strategy.go: reorder DigitPrefilter check - prefilter/teddy.go: change default to 2-byte fingerprint - prefilter/teddy_ssse3_amd64.go: add dispatch for case 2 - prefilter/teddy_ssse3_amd64.s: implement teddySlimSSSE3_2 - prefilter/teddy_test.go: update test expectation

github-actions · 2026-01-06T12:22:27Z

Benchmark Comparison

Comparing main → PR #61

Summary: geomean 173.0n 172.7n -0.17%

⚠️ Potential regressions detected:

geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
AhoCorasickVsStdlib/coregex_IsMatch-4                   1.249µ ± ∞ ¹    1.252µ ± ∞ ¹   +0.24% (p=0.032 n=5)
IPRegex_Find/coregex_1KB_sparse-4                       2.307µ ± ∞ ¹    4.298µ ± ∞ ¹  +86.30% (p=0.008 n=5)
IPRegex_Find/stdlib_1MB_sparse-4                        1.208m ± ∞ ¹    2.155m ± ∞ ¹  +78.45% (p=0.008 n=5)
IPRegex_Find/stdlib_1MB_dense-4                         8.084µ ± ∞ ¹   15.050µ ± ∞ ¹  +86.17% (p=0.008 n=5)
Find/hello-4                                            531.2n ± ∞ ¹    535.6n ± ∞ ¹   +0.83% (p=0.032 n=5)
IsMatch/literal-4                                       57.00n ± ∞ ¹    58.54n ± ∞ ¹   +2.70% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

kolkov mentioned this pull request Jan 6, 2026

perf: Research IP pattern optimization strategies #62

Closed

kolkov merged commit 34f1eae into main Jan 6, 2026
15 checks passed

kolkov deleted the feature/perf-optimization-v0.10.0 branch January 6, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize strategy selection and Teddy 2-byte fingerprint#61

perf: optimize strategy selection and Teddy 2-byte fingerprint#61
kolkov merged 1 commit intomainfrom
feature/perf-optimization-v0.10.0

kolkov commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kolkov commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Impact

Trade-off Analysis

Technical Details

Teddy 2-byte Fingerprint

Strategy Reorder

Test Plan

Uh oh!

github-actions bot commented Jan 6, 2026

Benchmark Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kolkov commented Jan 6, 2026 •

edited

Loading