Releases · coregx/coregex

06 Jan 19:45

kolkov

v0.9.5

6b02713

v0.9.5: Teddy 8→32 patterns, literal extraction fix

Changes

Teddy pattern limit expanded from 8 to 32 (#67)
- Slim Teddy now handles up to 32 patterns (was 8)
- Strategy threshold updated: Aho-Corasick triggers at >32 patterns (was >8)
- Follows Rust aho-corasick architecture

Fixed

Literal extraction for factored prefixes (#67)
- Problem: syntax.Parse factors (Wanderlust|Weltanschauung) → W(anderlust|eltanschauung)
- Caused wrong strategy selection: UseReverseSuffixSet instead of UseTeddy
- Benchmark fix: 376µs → 1.7µs (220x faster)

Install

go get github.com/coregx/coregex@v0.9.5

Assets 2

06 Jan 14:32

kolkov

v0.9.4

0d0785b

v0.9.4: Streaming State Machine for CharClassSearcher

What's Changed

Changed

Streaming state machine for CharClassSearcher - single-pass FindAll/Count
- New methods: FindAllIndices(), Count() with streaming state machine
- Eliminates per-match function call overhead
- Based on Rust regex approach: SEARCHING/MATCHING states
- Integrated into public API: FindAll(), FindAllIndex() use streaming path

Performance

CharClassFindAll: 15-30% faster (1500ns → 1100-1400ns on 1KB)
char_class gap vs Rust: reduced from 2.6x to ~1.9x
No regressions on other patterns (+0.05% geomean)

Full Changelog: v0.9.3...v0.9.4

Assets 2

06 Jan 12:42

kolkov

v0.9.3

5f8187c

v0.9.3: Teddy 2-byte fingerprint + strategy optimization

Summary

Optimize strategy selection and implement Teddy 2-byte fingerprint for reduced false positives.

Changes

Teddy 2-byte Fingerprint

Changed default from 1-byte to 2-byte fingerprint
New SSSE3 assembly: teddySlimSSSE3_2
Reduces false positives from ~25% to <0.5%

Strategy Selection Reorder

DigitPrefilter now checked before tiny NFA fallback
Added isDigitLeadPattern() helper for digit-lead pattern detection
Prevents high-frequency literals (like .) from being used as inner search targets

Performance

Pattern	v0.9.2	v0.9.3	Change
literal_alt	31ms	8ms	+4x faster
version	8.2ms	2ms	+4x faster
IP	3.9ms	5.5ms	-43% (trade-off)

Note: IP pattern is 43% slower but remains 2.2x faster than Rust regex. See #62 for future optimization research.

Full Changelog

https://github.com/coregx/coregex/blob/main/CHANGELOG.md#093---2026-01-06

Assets 2

06 Jan 10:59

kolkov

v0.9.2

30dbd01

v0.9.2: Simplified DigitPrefilter (146x IP speedup)

What's Changed

Replaced adaptive switching approach from v0.9.1 with a simpler and faster solution.

Background

v0.9.1 added runtime adaptive switching to handle dense digit data. Testing revealed that:

Adaptive tracking itself added overhead (~50ms on 6MB)
Complex patterns (like IP with 74 NFA states) are better served by pure DFA

New Approach

Instead of runtime adaptation, we now use compile-time strategy selection:

Simple digit patterns (≤100 NFA states) → DigitPrefilter
Complex digit patterns (>100 NFA states) → LazyDFA

This eliminates runtime overhead while achieving better performance.

Performance Improvements

Pattern	v0.9.1	v0.9.2	Speedup
IP	731ms	5ms	146x
char_class	183ms	113ms	1.6x
literal_alt	61ms	29ms	2.1x

Changes

Remove digitPrefilterAdaptiveThreshold (runtime tracking)
Add digitPrefilterMaxNFAStates=100 (compile-time limit)
Add PikeVM.SearchBetween for bounded search optimization
Update benchmarks in README

Full Changelog: v0.9.1...v0.9.2

Assets 2

05 Jan 01:10

kolkov

v0.9.1

d5c6862

v0.9.1: DigitPrefilter Adaptive Switching

Fixed

DigitPrefilter adaptive switching for high false-positive scenarios

Problem: DigitPrefilter was slow on dense digit data (many consecutive FPs)
Solution: Runtime adaptive switching - after 64 consecutive false positives, switch to DFA
Based on Rust regex insight: "prefilter with high FP rate makes search slower"

Performance (IP regex benchmarks)

Scenario	stdlib	coregex	Speedup
Sparse 64KB	833 µs	2.8 µs	300x
Dense 64KB	8.5 µs	2.4 µs	3.5x
No IPs 1MB	60.7 ms	19.8 µs	3000x

Details

Sparse data: prefilter remains fast (100-3000x speedup via SIMD skip)
Dense data: adaptively switches to lazy DFA (3-5x speedup vs stdlib)
New stat: Stats.PrefilterAbandoned tracks adaptive switching events
New constant: digitPrefilterAdaptiveThreshold = 64

Full Changelog: v0.9.0...v0.9.1

Assets 2

04 Jan 22:58

kolkov

v0.9.0

e09f196

v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD

Highlights

UseAhoCorasick Strategy

Large literal alternations (>8 patterns) via github.com/coregx/ahocorasick
75-113x faster than stdlib on 15-20 pattern alternations
O(n) multi-pattern matching with ~1.6 GB/s throughput

DigitPrefilter Strategy (#56)

AVX2 SIMD digit scanner for IP regex patterns
2500x faster on no-match scenarios
39-152x faster on sparse IP data

Paired-byte SIMD Search (#55)

Byte frequency analysis for optimal rare byte selection
AVX2 MemchrPair() searches two bytes simultaneously
Dramatically reduces false positives

Installation

go get github.com/coregx/coregex@v0.9.0

See CHANGELOG.md for full details.

Assets 2

14 Dec 00:46

kolkov

v0.8.24

5850603

v0.8.24: Longest() mode optimization

Fixed

Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)

Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
Solution: Implemented backtrackFindLongest() that explores all branches at splits
Found by: Ben Hoyt (GoAWK integration testing with re.Longest())

Performance (Longest() mode)

Metric	Before	After	Improvement
coregex Longest()	450 ns	133 ns	3.4x faster
Longest() overhead	+270%	+8%	Target was +10%
vs stdlib Longest()	2.4x slower	1.37x faster	—

Install

go get github.com/coregx/coregex@v0.8.24

Full Changelog: v0.8.23...v0.8.24

Assets 2

13 Dec 20:54

kolkov

v0.8.23

d16020a

v0.8.23: Unicode char class fix

Critical Bug Fix

Unicode character classes now work correctly.

The Bug

Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:

// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)

// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)

Root Cause

CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.

Fix

Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.

Affected Patterns

Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.

Credit

Found by Ben Hoyt during GoAWK integration testing.

Upgrade recommended for all users with internationalized patterns.

Assets 2

13 Dec 10:30

kolkov

v0.8.22

0837e6a

v0.8.22: Small string optimization

Small String Optimization (1.4-20x faster)

Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).

Key Optimizations

Zero-allocation string-to-bytes conversion
- stringToBytes() using unsafe.Slice (like Rust's as_bytes())
- MatchString: 48B/op → 0B/op
BoundedBacktracker for small NFA patterns
- O(1) generation-based reset vs PikeVM's thread queues
- 2-3x faster on small inputs
Prefilter integration in NFA path

Performance Results

Pattern	stdlib	coregex	Speedup
`j[a-z]+p`	357ns	253ns	1.4x
`\d+`	1.13µs	57ns	20x
`\w+`	1.05µs	58ns	18x
`[a-z]+`	1.02µs	63ns	16x

Commits

perf: optimize small string matching with BoundedBacktracker (#46)

Closes #47

Contributors

benhoyt

Assets 2

12 Dec 23:10

kolkov

v0.8.21

aff7f51

v0.8.21: CharClassSearcher + ByteClasses compression

What's New

Added

CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)
- Patterns like [\w]+, \d+, [a-z]+ now use O(1) byte membership test
- 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
- 2x faster than Rust regex! (57ms → 27ms)
- Zero allocations in hot path
UseCharClassSearcher strategy
- Auto-selected for simple char_class patterns without capture groups
- Patterns WITH captures ((\w)+) continue to use BoundedBacktracker
Zero-allocation Count() method

Fixed

DFA ByteClasses compression (Rust-style optimization)
- Compile memory for hello pattern: 1195KB → 598KB (2x reduction)
Removed unused reverseDFA field from Engine
- Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
Reverse NFA ByteClasses registration
- Matches Rust's approach in nfa.rs

Performance Summary

Pattern	Input	stdlib	coregex	Rust	coregex vs Rust
`[\w]+`	6MB, 1.3M matches	623ms	27ms	57ms	2.1x faster

Pattern	Before	After	Improvement
`hello` compile	1195KB	598KB	-50%
char_class runtime	180ms	109ms	-39%

Full Changelog: v0.8.20...v0.8.21

Assets 2

Releases: coregx/coregex

v0.9.5: Teddy 8→32 patterns, literal extraction fix

Changes

Fixed

Install

Uh oh!

v0.9.4: Streaming State Machine for CharClassSearcher

What's Changed

Changed

Performance

Uh oh!

v0.9.3: Teddy 2-byte fingerprint + strategy optimization

Summary

Changes

Teddy 2-byte Fingerprint

Strategy Selection Reorder

Performance

Full Changelog

Uh oh!

v0.9.2: Simplified DigitPrefilter (146x IP speedup)

What's Changed

Background

New Approach

Performance Improvements

Changes

Uh oh!

v0.9.1: DigitPrefilter Adaptive Switching

Fixed

Performance (IP regex benchmarks)

Details

Uh oh!

v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD

Highlights

UseAhoCorasick Strategy

DigitPrefilter Strategy (#56)

Paired-byte SIMD Search (#55)

Installation

Uh oh!

v0.8.24: Longest() mode optimization

Fixed

Performance (Longest() mode)

Install

Uh oh!

v0.8.23: Unicode char class fix

Critical Bug Fix

The Bug

Root Cause

Fix

Affected Patterns

Credit

Uh oh!

v0.8.22: Small string optimization

Small String Optimization (1.4-20x faster)

Key Optimizations

Performance Results

Commits

Contributors

Uh oh!

v0.8.21: CharClassSearcher + ByteClasses compression

What's New

Added

Fixed

Performance Summary

Uh oh!