Releases: coregx/coregex
v0.9.5: Teddy 8→32 patterns, literal extraction fix
Changes
- Teddy pattern limit expanded from 8 to 32 (#67)
- Slim Teddy now handles up to 32 patterns (was 8)
- Strategy threshold updated: Aho-Corasick triggers at >32 patterns (was >8)
- Follows Rust aho-corasick architecture
Fixed
- Literal extraction for factored prefixes (#67)
- Problem:
syntax.Parsefactors(Wanderlust|Weltanschauung)→W(anderlust|eltanschauung) - Caused wrong strategy selection: UseReverseSuffixSet instead of UseTeddy
- Benchmark fix: 376µs → 1.7µs (220x faster)
- Problem:
Install
go get github.com/coregx/coregex@v0.9.5v0.9.4: Streaming State Machine for CharClassSearcher
What's Changed
Changed
- Streaming state machine for CharClassSearcher - single-pass FindAll/Count
- New methods:
FindAllIndices(),Count()with streaming state machine - Eliminates per-match function call overhead
- Based on Rust regex approach: SEARCHING/MATCHING states
- Integrated into public API:
FindAll(),FindAllIndex()use streaming path
- New methods:
Performance
- CharClassFindAll: 15-30% faster (1500ns → 1100-1400ns on 1KB)
- char_class gap vs Rust: reduced from 2.6x to ~1.9x
- No regressions on other patterns (+0.05% geomean)
Full Changelog: v0.9.3...v0.9.4
v0.9.3: Teddy 2-byte fingerprint + strategy optimization
Summary
Optimize strategy selection and implement Teddy 2-byte fingerprint for reduced false positives.
Changes
Teddy 2-byte Fingerprint
- Changed default from 1-byte to 2-byte fingerprint
- New SSSE3 assembly:
teddySlimSSSE3_2 - Reduces false positives from ~25% to <0.5%
Strategy Selection Reorder
- DigitPrefilter now checked before tiny NFA fallback
- Added
isDigitLeadPattern()helper for digit-lead pattern detection - Prevents high-frequency literals (like
.) from being used as inner search targets
Performance
| Pattern | v0.9.2 | v0.9.3 | Change |
|---|---|---|---|
| literal_alt | 31ms | 8ms | +4x faster |
| version | 8.2ms | 2ms | +4x faster |
| IP | 3.9ms | 5.5ms | -43% (trade-off) |
Note: IP pattern is 43% slower but remains 2.2x faster than Rust regex. See #62 for future optimization research.
Full Changelog
https://github.com/coregx/coregex/blob/main/CHANGELOG.md#093---2026-01-06
v0.9.2: Simplified DigitPrefilter (146x IP speedup)
What's Changed
Replaced adaptive switching approach from v0.9.1 with a simpler and faster solution.
Background
v0.9.1 added runtime adaptive switching to handle dense digit data. Testing revealed that:
- Adaptive tracking itself added overhead (~50ms on 6MB)
- Complex patterns (like IP with 74 NFA states) are better served by pure DFA
New Approach
Instead of runtime adaptation, we now use compile-time strategy selection:
- Simple digit patterns (≤100 NFA states) → DigitPrefilter
- Complex digit patterns (>100 NFA states) → LazyDFA
This eliminates runtime overhead while achieving better performance.
Performance Improvements
| Pattern | v0.9.1 | v0.9.2 | Speedup |
|---|---|---|---|
| IP | 731ms | 5ms | 146x |
| char_class | 183ms | 113ms | 1.6x |
| literal_alt | 61ms | 29ms | 2.1x |
Changes
- Remove
digitPrefilterAdaptiveThreshold(runtime tracking) - Add
digitPrefilterMaxNFAStates=100(compile-time limit) - Add
PikeVM.SearchBetweenfor bounded search optimization - Update benchmarks in README
Full Changelog: v0.9.1...v0.9.2
v0.9.1: DigitPrefilter Adaptive Switching
Fixed
DigitPrefilter adaptive switching for high false-positive scenarios
- Problem: DigitPrefilter was slow on dense digit data (many consecutive FPs)
- Solution: Runtime adaptive switching - after 64 consecutive false positives, switch to DFA
- Based on Rust regex insight: "prefilter with high FP rate makes search slower"
Performance (IP regex benchmarks)
| Scenario | stdlib | coregex | Speedup |
|---|---|---|---|
| Sparse 64KB | 833 µs | 2.8 µs | 300x |
| Dense 64KB | 8.5 µs | 2.4 µs | 3.5x |
| No IPs 1MB | 60.7 ms | 19.8 µs | 3000x |
Details
- Sparse data: prefilter remains fast (100-3000x speedup via SIMD skip)
- Dense data: adaptively switches to lazy DFA (3-5x speedup vs stdlib)
- New stat:
Stats.PrefilterAbandonedtracks adaptive switching events - New constant:
digitPrefilterAdaptiveThreshold = 64
Full Changelog: v0.9.0...v0.9.1
v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD
Highlights
UseAhoCorasick Strategy
- Large literal alternations (>8 patterns) via
github.com/coregx/ahocorasick - 75-113x faster than stdlib on 15-20 pattern alternations
- O(n) multi-pattern matching with ~1.6 GB/s throughput
DigitPrefilter Strategy (#56)
- AVX2 SIMD digit scanner for IP regex patterns
- 2500x faster on no-match scenarios
- 39-152x faster on sparse IP data
Paired-byte SIMD Search (#55)
- Byte frequency analysis for optimal rare byte selection
- AVX2
MemchrPair()searches two bytes simultaneously - Dramatically reduces false positives
Installation
go get github.com/coregx/coregex@v0.9.0See CHANGELOG.md for full details.
v0.8.24: Longest() mode optimization
Fixed
Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)
- Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
- Solution: Implemented
backtrackFindLongest()that explores all branches at splits - Found by: Ben Hoyt (GoAWK integration testing with
re.Longest())
Performance (Longest() mode)
| Metric | Before | After | Improvement |
|---|---|---|---|
| coregex Longest() | 450 ns | 133 ns | 3.4x faster |
| Longest() overhead | +270% | +8% | Target was +10% |
| vs stdlib Longest() | 2.4x slower | 1.37x faster | — |
Install
go get github.com/coregx/coregex@v0.8.24Full Changelog: v0.8.23...v0.8.24
v0.8.23: Unicode char class fix
Critical Bug Fix
Unicode character classes now work correctly.
The Bug
Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:
// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)
// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)Root Cause
CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.
Fix
Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.
Affected Patterns
Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.
Credit
Found by Ben Hoyt during GoAWK integration testing.
Upgrade recommended for all users with internationalized patterns.
v0.8.22: Small string optimization
Small String Optimization (1.4-20x faster)
Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).
Key Optimizations
-
Zero-allocation string-to-bytes conversion
stringToBytes()usingunsafe.Slice(like Rust'sas_bytes())MatchString: 48B/op → 0B/op
-
BoundedBacktracker for small NFA patterns
- O(1) generation-based reset vs PikeVM's thread queues
- 2-3x faster on small inputs
-
Prefilter integration in NFA path
Performance Results
| Pattern | stdlib | coregex | Speedup |
|---|---|---|---|
j[a-z]+p |
357ns | 253ns | 1.4x |
\d+ |
1.13µs | 57ns | 20x |
\w+ |
1.05µs | 58ns | 18x |
[a-z]+ |
1.02µs | 63ns | 16x |
Commits
- perf: optimize small string matching with BoundedBacktracker (#46)
Closes #47
v0.8.21: CharClassSearcher + ByteClasses compression
What's New
Added
-
CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)
- Patterns like
[\w]+,\d+,[a-z]+now use O(1) byte membership test - 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
- 2x faster than Rust regex! (57ms → 27ms)
- Zero allocations in hot path
- Patterns like
-
UseCharClassSearcher strategy
- Auto-selected for simple char_class patterns without capture groups
- Patterns WITH captures (
(\w)+) continue to use BoundedBacktracker
-
Zero-allocation Count() method
Fixed
-
DFA ByteClasses compression (Rust-style optimization)
- Compile memory for
hellopattern: 1195KB → 598KB (2x reduction)
- Compile memory for
-
Removed unused reverseDFA field from Engine
- Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
-
Reverse NFA ByteClasses registration
- Matches Rust's approach in
nfa.rs
- Matches Rust's approach in
Performance Summary
| Pattern | Input | stdlib | coregex | Rust | coregex vs Rust |
|---|---|---|---|---|---|
[\w]+ |
6MB, 1.3M matches | 623ms | 27ms | 57ms | 2.1x faster |
| Pattern | Before | After | Improvement |
|---|---|---|---|
hello compile |
1195KB | 598KB | -50% |
| char_class runtime | 180ms | 109ms | -39% |
Full Changelog: v0.8.20...v0.8.21