Skip to content

Releases: coregx/coregex

v0.9.5: Teddy 8→32 patterns, literal extraction fix

06 Jan 19:45

Choose a tag to compare

Changes

  • Teddy pattern limit expanded from 8 to 32 (#67)
    • Slim Teddy now handles up to 32 patterns (was 8)
    • Strategy threshold updated: Aho-Corasick triggers at >32 patterns (was >8)
    • Follows Rust aho-corasick architecture

Fixed

  • Literal extraction for factored prefixes (#67)
    • Problem: syntax.Parse factors (Wanderlust|Weltanschauung)W(anderlust|eltanschauung)
    • Caused wrong strategy selection: UseReverseSuffixSet instead of UseTeddy
    • Benchmark fix: 376µs → 1.7µs (220x faster)

Install

go get github.com/coregx/coregex@v0.9.5

v0.9.4: Streaming State Machine for CharClassSearcher

06 Jan 14:32

Choose a tag to compare

What's Changed

Changed

  • Streaming state machine for CharClassSearcher - single-pass FindAll/Count
    • New methods: FindAllIndices(), Count() with streaming state machine
    • Eliminates per-match function call overhead
    • Based on Rust regex approach: SEARCHING/MATCHING states
    • Integrated into public API: FindAll(), FindAllIndex() use streaming path

Performance

  • CharClassFindAll: 15-30% faster (1500ns → 1100-1400ns on 1KB)
  • char_class gap vs Rust: reduced from 2.6x to ~1.9x
  • No regressions on other patterns (+0.05% geomean)

Full Changelog: v0.9.3...v0.9.4

v0.9.3: Teddy 2-byte fingerprint + strategy optimization

06 Jan 12:42

Choose a tag to compare

Summary

Optimize strategy selection and implement Teddy 2-byte fingerprint for reduced false positives.

Changes

Teddy 2-byte Fingerprint

  • Changed default from 1-byte to 2-byte fingerprint
  • New SSSE3 assembly: teddySlimSSSE3_2
  • Reduces false positives from ~25% to <0.5%

Strategy Selection Reorder

  • DigitPrefilter now checked before tiny NFA fallback
  • Added isDigitLeadPattern() helper for digit-lead pattern detection
  • Prevents high-frequency literals (like .) from being used as inner search targets

Performance

Pattern v0.9.2 v0.9.3 Change
literal_alt 31ms 8ms +4x faster
version 8.2ms 2ms +4x faster
IP 3.9ms 5.5ms -43% (trade-off)

Note: IP pattern is 43% slower but remains 2.2x faster than Rust regex. See #62 for future optimization research.

Full Changelog

https://github.com/coregx/coregex/blob/main/CHANGELOG.md#093---2026-01-06

v0.9.2: Simplified DigitPrefilter (146x IP speedup)

06 Jan 10:59
30dbd01

Choose a tag to compare

What's Changed

Replaced adaptive switching approach from v0.9.1 with a simpler and faster solution.

Background

v0.9.1 added runtime adaptive switching to handle dense digit data. Testing revealed that:

  1. Adaptive tracking itself added overhead (~50ms on 6MB)
  2. Complex patterns (like IP with 74 NFA states) are better served by pure DFA

New Approach

Instead of runtime adaptation, we now use compile-time strategy selection:

  • Simple digit patterns (≤100 NFA states) → DigitPrefilter
  • Complex digit patterns (>100 NFA states) → LazyDFA

This eliminates runtime overhead while achieving better performance.

Performance Improvements

Pattern v0.9.1 v0.9.2 Speedup
IP 731ms 5ms 146x
char_class 183ms 113ms 1.6x
literal_alt 61ms 29ms 2.1x

Changes

  • Remove digitPrefilterAdaptiveThreshold (runtime tracking)
  • Add digitPrefilterMaxNFAStates=100 (compile-time limit)
  • Add PikeVM.SearchBetween for bounded search optimization
  • Update benchmarks in README

Full Changelog: v0.9.1...v0.9.2

v0.9.1: DigitPrefilter Adaptive Switching

05 Jan 01:10
d5c6862

Choose a tag to compare

Fixed

DigitPrefilter adaptive switching for high false-positive scenarios

  • Problem: DigitPrefilter was slow on dense digit data (many consecutive FPs)
  • Solution: Runtime adaptive switching - after 64 consecutive false positives, switch to DFA
  • Based on Rust regex insight: "prefilter with high FP rate makes search slower"

Performance (IP regex benchmarks)

Scenario stdlib coregex Speedup
Sparse 64KB 833 µs 2.8 µs 300x
Dense 64KB 8.5 µs 2.4 µs 3.5x
No IPs 1MB 60.7 ms 19.8 µs 3000x

Details

  • Sparse data: prefilter remains fast (100-3000x speedup via SIMD skip)
  • Dense data: adaptively switches to lazy DFA (3-5x speedup vs stdlib)
  • New stat: Stats.PrefilterAbandoned tracks adaptive switching events
  • New constant: digitPrefilterAdaptiveThreshold = 64

Full Changelog: v0.9.0...v0.9.1

v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD

04 Jan 22:58
e09f196

Choose a tag to compare

Highlights

UseAhoCorasick Strategy

  • Large literal alternations (>8 patterns) via github.com/coregx/ahocorasick
  • 75-113x faster than stdlib on 15-20 pattern alternations
  • O(n) multi-pattern matching with ~1.6 GB/s throughput

DigitPrefilter Strategy (#56)

  • AVX2 SIMD digit scanner for IP regex patterns
  • 2500x faster on no-match scenarios
  • 39-152x faster on sparse IP data

Paired-byte SIMD Search (#55)

  • Byte frequency analysis for optimal rare byte selection
  • AVX2 MemchrPair() searches two bytes simultaneously
  • Dramatically reduces false positives

Installation

go get github.com/coregx/coregex@v0.9.0

See CHANGELOG.md for full details.

v0.8.24: Longest() mode optimization

14 Dec 00:46
5850603

Choose a tag to compare

Fixed

Longest() mode performance - BoundedBacktracker now supports leftmost-longest matching (#52)

  • Root cause: BoundedBacktracker was disabled entirely in Longest() mode, forcing PikeVM fallback
  • Solution: Implemented backtrackFindLongest() that explores all branches at splits
  • Found by: Ben Hoyt (GoAWK integration testing with re.Longest())

Performance (Longest() mode)

Metric Before After Improvement
coregex Longest() 450 ns 133 ns 3.4x faster
Longest() overhead +270% +8% Target was +10%
vs stdlib Longest() 2.4x slower 1.37x faster

Install

go get github.com/coregx/coregex@v0.8.24

Full Changelog: v0.8.23...v0.8.24

v0.8.23: Unicode char class fix

13 Dec 20:54

Choose a tag to compare

Critical Bug Fix

Unicode character classes now work correctly.

The Bug

Character classes with non-ASCII characters (code points 128-255) returned incorrect matches:

// Before v0.8.23:
re := coregex.MustCompile(`[föd]+`)
re.FindString("fööd") // returned "f" (wrong!)

// After v0.8.23:
re.FindString("fööd") // returns "fööd" (correct)

Root Cause

CharClassSearcher uses a 256-byte lookup table for O(1) membership testing. The guard was rune > 255 but characters like ö (code point 246) are multi-byte in UTF-8 (0xC3 0xB6), so byte-based lookup fails.

Fix

Changed check from > 255 to > 127 - only true ASCII (0-127) can use byte lookup table.

Affected Patterns

Any character class containing non-ASCII: [äöü]+, [café]+, [α-ω]+, etc.

Credit

Found by Ben Hoyt during GoAWK integration testing.

Upgrade recommended for all users with internationalized patterns.

v0.8.22: Small string optimization

13 Dec 10:30
0837e6a

Choose a tag to compare

Small String Optimization (1.4-20x faster)

Addresses performance issues reported by @benhoyt (#29) where coregex was 2-6x slower than stdlib on small inputs (~44 bytes).

Key Optimizations

  1. Zero-allocation string-to-bytes conversion

    • stringToBytes() using unsafe.Slice (like Rust's as_bytes())
    • MatchString: 48B/op → 0B/op
  2. BoundedBacktracker for small NFA patterns

    • O(1) generation-based reset vs PikeVM's thread queues
    • 2-3x faster on small inputs
  3. Prefilter integration in NFA path

Performance Results

Pattern stdlib coregex Speedup
j[a-z]+p 357ns 253ns 1.4x
\d+ 1.13µs 57ns 20x
\w+ 1.05µs 58ns 18x
[a-z]+ 1.02µs 63ns 16x

Commits

  • perf: optimize small string matching with BoundedBacktracker (#46)

Closes #47

v0.8.21: CharClassSearcher + ByteClasses compression

12 Dec 23:10
aff7f51

Choose a tag to compare

What's New

Added

  • CharClassSearcher - Specialized 256-byte lookup table for simple char_class patterns (Fixes #44)

    • Patterns like [\w]+, \d+, [a-z]+ now use O(1) byte membership test
    • 23x faster than stdlib (623ms → 27ms on 6MB input with 1.3M matches)
    • 2x faster than Rust regex! (57ms → 27ms)
    • Zero allocations in hot path
  • UseCharClassSearcher strategy

    • Auto-selected for simple char_class patterns without capture groups
    • Patterns WITH captures ((\w)+) continue to use BoundedBacktracker
  • Zero-allocation Count() method

Fixed

  • DFA ByteClasses compression (Rust-style optimization)

    • Compile memory for hello pattern: 1195KB → 598KB (2x reduction)
  • Removed unused reverseDFA field from Engine

    • Was creating redundant reverse DFA for ALL patterns (2x memory overhead)
  • Reverse NFA ByteClasses registration

    • Matches Rust's approach in nfa.rs

Performance Summary

Pattern Input stdlib coregex Rust coregex vs Rust
[\w]+ 6MB, 1.3M matches 623ms 27ms 57ms 2.1x faster
Pattern Before After Improvement
hello compile 1195KB 598KB -50%
char_class runtime 180ms 109ms -39%

Full Changelog: v0.8.20...v0.8.21