Releases: coregx/coregex
v0.12.14: concurrent isMatchDFA safety fix
Fixed
isMatchDFAconcurrent safety (#137) — prefilter candidate loop called shared lazy DFA concurrently fromRunParallel. On ARM64 without SIMD prefilters: cache corruption, 1.7GB allocs, 1s+ per op on M2 Max. Fix: prefilter for fast rejection only, pooled PikeVM for verification.
Added
TestConcurrentCaseInsensitivePrefilter— 8 goroutines × 100 iterations, match + no-match paths. Catches the race on macOS with-raceflag.
Reported by @tjbrains on Apple M2 Max.
Full Changelog: v0.12.13...v0.12.14
v0.12.13: FatTeddy fix, prefilter acceleration, AC v0.2.1
What's Changed
Performance
- FatTeddy VPTEST hot loop — 8 instructions → 1 for candidate detection (24% faster scan)
- FatTeddy batch FindAllPositions — one ASM call per 64KB chunk, eliminates Go→ASM round trips. FindAll 39ms → 22ms (1.8x)
- Prefilter-accelerated isMatch/FindIndices — candidate loop with anchored DFA for large NFAs (>100 states). #137 match case: 176μs → 27μs
- Cascading prefix trim (Rust-style) — >64 literals trimmed to fit Teddy. auth_attempts 34ms → 7ms
Fixed
- FatTeddy AVX2 ANDL→ORL — lane combining missed single-lane patterns.
(?i)get|post|put: 11456 → 34368 matches - Non-amd64 build — added hasAVX2 and batch stubs for macOS ARM64
Dependencies
- ahocorasick v0.1.0 → v0.2.1 — DFA backend + SIMD prefilter, 11-22x throughput
LangArena LogParser total: 757ms → 144ms (5.3x faster). Gap to Rust: 13x → 2.5x.
Full Changelog: v0.12.12...v0.12.13
v0.12.12: prefix trimming for case-fold literals
What's Changed
Performance
- Prefix trimming for case-fold expanded literals — when
(?i)expansion produces >32 incomplete prefix literals, trims to 4-byte prefixes and deduplicates. Fits Teddy SIMD prefilter instead of slower Aho-Corasick. LangArenasuspicious: 117ms → 5.7ms (20x faster, 2.6x from Rust, was 54x).
Full Changelog: v0.12.11...v0.12.12
v0.12.11: ReverseSuffix multi-wildcard + COREGEX_DEBUG
What's Changed
Performance
- ReverseSuffix for multi-wildcard patterns — patterns like
\d+\.\d+\.\d+\.35now use memmem suffix search instead of DigitPrefilter. LangArenaips: 57ms → 0.37ms (154x faster, 1.5x from Rust). Verified withRUST_LOG=debug— identical strategy.
Added
COREGEX_DEBUGenv var — compile-time strategy logging comparable to Rust'sRUST_LOG=debug. Level 1: strategy, NFA states, prefilter type, engines built/skipped. Level 2: + prefix/suffix literal contents. Zero cost when disabled.
Fixed
Find()leftmost semantics —ReverseSuffixSearcher.Find()usedbytes.LastIndex(rightmost) for non-.*patterns, returning last match instead of first. Now usesbytes.Indexfor leftmost match.
Full Changelog: v0.12.10...v0.12.11
v0.12.10: case-insensitive literals + DigitPrefilter fix
What's Changed
Performance
- Case-insensitive literal extraction (Issue #137) — literal extractor now expands
(?i)patterns into all case-folding variants and feeds them to Teddy/Aho-Corasick prefilter. Pattern(?iU)\b(eval|system|exec|...)\b: 88,000x slower than stdlib → 24x faster.
Fixed
- FatTeddy
FindMatchfalse negatives — FatTeddy AVX2 at non-zero positions brokeFindAllfor >32-pattern alternations. Replaced with Aho-Corasick prefilter. isMatchDigitPrefilterO(n²) — used unanchoreddfa.FindAtscanning to end of input per candidate. 7 min → 2.1ms on 6MB (200,000x faster).- Large NFA fallback — >100 NFA states without prefilter now falls back to PikeVM instead of DFA cache thrashing.
Full Changelog: v0.12.9...v0.12.10
v0.12.9: bidirectional DFA, Teddy/reverse NFA fixes
What's Changed
Performance
- Bidirectional DFA for UseDFA strategy — eliminates PikeVM second pass. Three-phase search: forward DFA (SearchFirstAt) → match end, reverse DFA → match start, anchored forward DFA → correct greedy end. All phases O(n) vs O(n×states) PikeVM.
Fixed
- Teddy prefilter
IsCompleteflag —newTeddyFromSeq()hardcodedcomplete=true, causing false positives inIsMatchfor prefix-only literals. Now passesseq.AllComplete(). - Reverse NFA drops epsilon edges on mixed states —
fillReverseState()silently dropped epsilon edges when byte range edges were present, breakingSearchReversefor quantifier patterns ([a-z]+,\w+, etc.).
Tests
- LangArena LogParser (13 patterns) and Template::Regex benchmarks added.
Full Changelog: v0.12.8...v0.12.9
v0.12.8: streaming ReplaceAll + DFA-first FindSubmatchAt
Performance
-
Streaming ReplaceAll —
ReplaceAllStringFunc,ReplaceAllFunc,ReplaceAllLiteral, andReplaceAllLiteralStringconverted from two-pass (collect all match indices → iterate) to single-pass streaming. Eliminates[][]intallocation for high-match-count inputs. Returns original string when no matches (Cow-like optimization). (#135) -
DFA-first FindSubmatchAt — Rust-style two-phase search for capture extraction: Phase 1 finds match boundaries via DFA/strategy, Phase 2 runs PikeVM only within the match span. Reduces PikeVM work from O(remaining_haystack) to O(match_len) per match. (#135)
-
FindAllSubmatch state reuse — acquires
SearchStateonce for entire iteration loop, eliminating per-matchsync.Pooloverhead.
Fixed
-
BoundedBacktracker stack overflow on 386/macOS — recursive backtracking overflowed 250MB stack on large inputs with deep UTF-8 NFA chains. Strategies using BoundedBacktracker now bypass two-phase search.
-
\Bfalse positive at end of input —SearchWithCapturesAtlost lookbehind context at end-of-input positions, causing incorrect\Bmatches. -
Data race in concurrent FindSubmatch — strategies
UseDFA,UseBoth, andUseDigitPrefilteraccessed shared mutable state (e.dfa,e.pikevm). ConcurrentFindSubmatchcalls now use pooled thread-safe state. -
FindAllSubmatch lookbehind context loss — previously sliced haystack, losing
\bword boundary context at match boundaries.
v0.12.7: PikeVM sparse-dispatch for dot patterns
Performance
- PikeVM sparse-dispatch for dot patterns (PR #134) — NFA compiler now emits a single
StateSparsewith ~12 transitions for.(AnyCharNotNL) instead of ~9 chainedStateSplitstates. PikeVM dispatches in O(1) instead of O(branches) split-chain traversal. Same approach as Rust regex'sState::Sparse.
Benchmarks
PikeVM speedup on dot-heavy patterns: 2.8-4.8x
| Pattern | Before | After | Speedup |
|---|---|---|---|
.*? (non-greedy dot) |
1.00x | 0.35x | 2.8x |
.+ (greedy dot-plus) |
1.00x | 0.21x | 4.8x |
.* (greedy dot-star) |
1.00x | 0.24x | 4.2x |
v0.12.6: BoundedBacktracker span fix + DFA FindAll optimization
What's Changed
Bug Fixes
-
BoundedBacktracker rejected valid searches on large inputs (#127) —
SearchAtWithState(haystack, at, state)checkedCanHandle(len(haystack))against the full haystack length, rejecting inputs >2.4MB even when the remaining search span[at, len(haystack)]easily fit. LogParser on 7MB log files returned 22004 matches instead of correct 33089. Fix: span-based visited table sizing matching Rust regex'sInputspan model. Reported by @kostya. -
ReplaceAllStringFuncO(n²) performance — Usedresult += stringconcatenation in a loop. 150K replacements on 6MB: 2m19s → 1.3s withstrings.Builder.
Performance
- DFA FindAll O(n²) → O(n) for dense-match inputs — Added
DFA.IsMatchAt()with early termination (O(k) vs O(n)), and prefilter skip that jumps PikeVM to candidate positions. Template\{\{(.*?)\}\}FindAll improved ~37%.
Full Changelog
v0.12.5: Non-greedy quantifier fix, ReverseSuffix correctness
Fixed
-
Non-greedy quantifiers behaved greedily (#124) — Patterns like
\{\{.*?\s*\}\}on{{ a }} {{ b }}returned the entire string instead of{{ a }}. Root cause: PikeVM'stookLeftpriority flag leaked from internal UTF-8 alternation chains into quantifier resets. Fix: replacedtookLeft/prioritysystem with Rust's DFS-ordering approach. Thread struct reduced from 40 to 24 bytes. -
ReverseSuffix missed matches for multi-group patterns (#124) — Patterns like
\d+\.\d+\.\d+\.35failed to match192.168.1.35. Fix: added guard inisSafeForReverseSuffix()to reject patterns with 2+ variable-length groups, plus forward DFA verification for correct greedy boundaries. Reported by @kostya.
Benchmark Results (AMD EPYC, 6MB input)
All 16 benchmark patterns verified — no regressions. inner_literal 0.23ms (coregex) vs 0.30ms (Rust) — 1.3x faster than Rust.
Full changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0125---2026-03-08