Skip to content

Releases: coregx/coregex

v0.12.14: concurrent isMatchDFA safety fix

19 Mar 07:27
ee6b351

Choose a tag to compare

Fixed

  • isMatchDFA concurrent safety (#137) — prefilter candidate loop called shared lazy DFA concurrently from RunParallel. On ARM64 without SIMD prefilters: cache corruption, 1.7GB allocs, 1s+ per op on M2 Max. Fix: prefilter for fast rejection only, pooled PikeVM for verification.

Added

  • TestConcurrentCaseInsensitivePrefilter — 8 goroutines × 100 iterations, match + no-match paths. Catches the race on macOS with -race flag.

Reported by @tjbrains on Apple M2 Max.

Full Changelog: v0.12.13...v0.12.14

v0.12.13: FatTeddy fix, prefilter acceleration, AC v0.2.1

18 Mar 12:54
7a29fab

Choose a tag to compare

What's Changed

Performance

  • FatTeddy VPTEST hot loop — 8 instructions → 1 for candidate detection (24% faster scan)
  • FatTeddy batch FindAllPositions — one ASM call per 64KB chunk, eliminates Go→ASM round trips. FindAll 39ms → 22ms (1.8x)
  • Prefilter-accelerated isMatch/FindIndices — candidate loop with anchored DFA for large NFAs (>100 states). #137 match case: 176μs → 27μs
  • Cascading prefix trim (Rust-style) — >64 literals trimmed to fit Teddy. auth_attempts 34ms → 7ms

Fixed

  • FatTeddy AVX2 ANDL→ORL — lane combining missed single-lane patterns. (?i)get|post|put: 11456 → 34368 matches
  • Non-amd64 build — added hasAVX2 and batch stubs for macOS ARM64

Dependencies

  • ahocorasick v0.1.0 → v0.2.1 — DFA backend + SIMD prefilter, 11-22x throughput

LangArena LogParser total: 757ms → 144ms (5.3x faster). Gap to Rust: 13x → 2.5x.

Full Changelog: v0.12.12...v0.12.13

v0.12.12: prefix trimming for case-fold literals

17 Mar 20:55
ee2d823

Choose a tag to compare

What's Changed

Performance

  • Prefix trimming for case-fold expanded literals — when (?i) expansion produces >32 incomplete prefix literals, trims to 4-byte prefixes and deduplicates. Fits Teddy SIMD prefilter instead of slower Aho-Corasick. LangArena suspicious: 117ms → 5.7ms (20x faster, 2.6x from Rust, was 54x).

Full Changelog: v0.12.11...v0.12.12

v0.12.11: ReverseSuffix multi-wildcard + COREGEX_DEBUG

17 Mar 20:20
7e7a099

Choose a tag to compare

What's Changed

Performance

  • ReverseSuffix for multi-wildcard patterns — patterns like \d+\.\d+\.\d+\.35 now use memmem suffix search instead of DigitPrefilter. LangArena ips: 57ms → 0.37ms (154x faster, 1.5x from Rust). Verified with RUST_LOG=debug — identical strategy.

Added

  • COREGEX_DEBUG env var — compile-time strategy logging comparable to Rust's RUST_LOG=debug. Level 1: strategy, NFA states, prefilter type, engines built/skipped. Level 2: + prefix/suffix literal contents. Zero cost when disabled.

Fixed

  • Find() leftmost semanticsReverseSuffixSearcher.Find() used bytes.LastIndex (rightmost) for non-.* patterns, returning last match instead of first. Now uses bytes.Index for leftmost match.

Full Changelog: v0.12.10...v0.12.11

v0.12.10: case-insensitive literals + DigitPrefilter fix

17 Mar 18:49
82008c9

Choose a tag to compare

What's Changed

Performance

  • Case-insensitive literal extraction (Issue #137) — literal extractor now expands (?i) patterns into all case-folding variants and feeds them to Teddy/Aho-Corasick prefilter. Pattern (?iU)\b(eval|system|exec|...)\b: 88,000x slower than stdlib → 24x faster.

Fixed

  • FatTeddy FindMatch false negatives — FatTeddy AVX2 at non-zero positions broke FindAll for >32-pattern alternations. Replaced with Aho-Corasick prefilter.
  • isMatchDigitPrefilter O(n²) — used unanchored dfa.FindAt scanning to end of input per candidate. 7 min → 2.1ms on 6MB (200,000x faster).
  • Large NFA fallback — >100 NFA states without prefilter now falls back to PikeVM instead of DFA cache thrashing.

Full Changelog: v0.12.9...v0.12.10

v0.12.9: bidirectional DFA, Teddy/reverse NFA fixes

17 Mar 14:50
1a70fd4

Choose a tag to compare

What's Changed

Performance

  • Bidirectional DFA for UseDFA strategy — eliminates PikeVM second pass. Three-phase search: forward DFA (SearchFirstAt) → match end, reverse DFA → match start, anchored forward DFA → correct greedy end. All phases O(n) vs O(n×states) PikeVM.

Fixed

  • Teddy prefilter IsComplete flagnewTeddyFromSeq() hardcoded complete=true, causing false positives in IsMatch for prefix-only literals. Now passes seq.AllComplete().
  • Reverse NFA drops epsilon edges on mixed statesfillReverseState() silently dropped epsilon edges when byte range edges were present, breaking SearchReverse for quantifier patterns ([a-z]+, \w+, etc.).

Tests

  • LangArena LogParser (13 patterns) and Template::Regex benchmarks added.

Full Changelog: v0.12.8...v0.12.9

v0.12.8: streaming ReplaceAll + DFA-first FindSubmatchAt

10 Mar 19:37
f5da9d3

Choose a tag to compare

Performance

  • Streaming ReplaceAllReplaceAllStringFunc, ReplaceAllFunc, ReplaceAllLiteral, and ReplaceAllLiteralString converted from two-pass (collect all match indices → iterate) to single-pass streaming. Eliminates [][]int allocation for high-match-count inputs. Returns original string when no matches (Cow-like optimization). (#135)

  • DFA-first FindSubmatchAt — Rust-style two-phase search for capture extraction: Phase 1 finds match boundaries via DFA/strategy, Phase 2 runs PikeVM only within the match span. Reduces PikeVM work from O(remaining_haystack) to O(match_len) per match. (#135)

  • FindAllSubmatch state reuse — acquires SearchState once for entire iteration loop, eliminating per-match sync.Pool overhead.

Fixed

  • BoundedBacktracker stack overflow on 386/macOS — recursive backtracking overflowed 250MB stack on large inputs with deep UTF-8 NFA chains. Strategies using BoundedBacktracker now bypass two-phase search.

  • \B false positive at end of inputSearchWithCapturesAt lost lookbehind context at end-of-input positions, causing incorrect \B matches.

  • Data race in concurrent FindSubmatch — strategies UseDFA, UseBoth, and UseDigitPrefilter accessed shared mutable state (e.dfa, e.pikevm). Concurrent FindSubmatch calls now use pooled thread-safe state.

  • FindAllSubmatch lookbehind context loss — previously sliced haystack, losing \b word boundary context at match boundaries.

v0.12.7: PikeVM sparse-dispatch for dot patterns

10 Mar 16:10
5d43429

Choose a tag to compare

Performance

  • PikeVM sparse-dispatch for dot patterns (PR #134) — NFA compiler now emits a single StateSparse with ~12 transitions for . (AnyCharNotNL) instead of ~9 chained StateSplit states. PikeVM dispatches in O(1) instead of O(branches) split-chain traversal. Same approach as Rust regex's State::Sparse.

Benchmarks

PikeVM speedup on dot-heavy patterns: 2.8-4.8x

Pattern Before After Speedup
.*? (non-greedy dot) 1.00x 0.35x 2.8x
.+ (greedy dot-plus) 1.00x 0.21x 4.8x
.* (greedy dot-star) 1.00x 0.24x 4.2x

Closes #132. Reported by @kostya via LangArena benchmarks.

v0.12.6: BoundedBacktracker span fix + DFA FindAll optimization

08 Mar 21:39
90c3f64

Choose a tag to compare

What's Changed

Bug Fixes

  • BoundedBacktracker rejected valid searches on large inputs (#127) — SearchAtWithState(haystack, at, state) checked CanHandle(len(haystack)) against the full haystack length, rejecting inputs >2.4MB even when the remaining search span [at, len(haystack)] easily fit. LogParser on 7MB log files returned 22004 matches instead of correct 33089. Fix: span-based visited table sizing matching Rust regex's Input span model. Reported by @kostya.

  • ReplaceAllStringFunc O(n²) performance — Used result += string concatenation in a loop. 150K replacements on 6MB: 2m19s → 1.3s with strings.Builder.

Performance

  • DFA FindAll O(n²) → O(n) for dense-match inputs — Added DFA.IsMatchAt() with early termination (O(k) vs O(n)), and prefilter skip that jumps PikeVM to candidate positions. Template \{\{(.*?)\}\} FindAll improved ~37%.

Full Changelog

v0.12.5...v0.12.6

v0.12.5: Non-greedy quantifier fix, ReverseSuffix correctness

08 Mar 19:26
2775d87

Choose a tag to compare

Fixed

  • Non-greedy quantifiers behaved greedily (#124) — Patterns like \{\{.*?\s*\}\} on {{ a }} {{ b }} returned the entire string instead of {{ a }}. Root cause: PikeVM's tookLeft priority flag leaked from internal UTF-8 alternation chains into quantifier resets. Fix: replaced tookLeft/priority system with Rust's DFS-ordering approach. Thread struct reduced from 40 to 24 bytes.

  • ReverseSuffix missed matches for multi-group patterns (#124) — Patterns like \d+\.\d+\.\d+\.35 failed to match 192.168.1.35. Fix: added guard in isSafeForReverseSuffix() to reject patterns with 2+ variable-length groups, plus forward DFA verification for correct greedy boundaries. Reported by @kostya.

Benchmark Results (AMD EPYC, 6MB input)

All 16 benchmark patterns verified — no regressions. inner_literal 0.23ms (coregex) vs 0.30ms (Rust) — 1.3x faster than Rust.

Full changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0125---2026-03-08