https://github.com/coregx/coregex
Pure Go production-grade regex engine with SIMD optimizations. Up to 3-3000x+ faster than stdlib.
https://github.com/coregx/coregex
avx2 dfa go golang multi-engine nfa performance pikevm regex regex-engine regexp simd ssse3
Last synced: 29 days ago
JSON representation
Pure Go production-grade regex engine with SIMD optimizations. Up to 3-3000x+ faster than stdlib.
- Host: GitHub
- URL: https://github.com/coregx/coregex
- Owner: coregx
- License: mit
- Created: 2025-11-27T08:49:38.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-03-25T08:15:06.000Z (about 1 month ago)
- Last Synced: 2026-03-25T11:29:13.634Z (about 1 month ago)
- Topics: avx2, dfa, go, golang, multi-engine, nfa, performance, pikevm, regex, regex-engine, regexp, simd, ssse3
- Language: Go
- Homepage: https://pkg.go.dev/github.com/coregx/coregex
- Size: 1.46 MB
- Stars: 158
- Watchers: 5
- Forks: 5
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project
- awesome-go-with-stars - coregex - crate architecture: multi-engine DFA/NFA, SIMD prefilters, drop-in stdlib replacement. | 2026-03-17 | (Template Engines / Regular Expressions)
- awesome-go - coregex - Production regex engine with Rust regex-crate architecture: multi-engine DFA/NFA, SIMD prefilters, drop-in stdlib replacement. (Text Processing / Regular Expressions)
- fucking-awesome-go - coregex - Production regex engine with Rust regex-crate architecture: multi-engine DFA/NFA, SIMD prefilters, drop-in stdlib replacement. (Text Processing / Regular Expressions)
README
# coregex
[](https://github.com/coregx/coregex/releases/latest)
[](https://go.dev/dl/)
[](https://pkg.go.dev/github.com/coregx/coregex)
[](https://github.com/coregx/coregex/actions)
[](https://goreportcard.com/report/github.com/coregx/coregex)
[](https://codecov.io/gh/coregx/coregex)
[](LICENSE)
[](https://github.com/coregx/coregex/stargazers)
[](https://github.com/coregx/coregex/issues)
[](https://github.com/coregx/coregex/discussions)
High-performance regex engine for Go. Drop-in replacement for `regexp` with **3-3000x speedup**.\*
\* Typical speedup 15-240x on real-world patterns. 1000x+ achieved on [specific edge cases](https://github.com/kolkov/regex-bench#extreme-speedups-1000-3000x) where prefilters skip entire input (e.g., IP pattern on text with no digits).
## Why coregex?
Go's stdlib `regexp` is intentionally simple — single NFA engine, no optimizations. This guarantees O(n) time but leaves performance on the table.
coregex brings Rust regex-crate architecture to Go:
- **Multi-engine**: 17 strategies — Lazy DFA, PikeVM, OnePass, BoundedBacktracker, and more
- **SIMD prefilters**: AVX2/SSSE3 for fast candidate rejection
- **Reverse search**: Suffix/inner literal patterns run 1000x+ faster
- **O(n) guarantee**: No backtracking, no ReDoS vulnerabilities
## Installation
```bash
go get github.com/coregx/coregex
```
Requires Go 1.25+. Minimal dependencies (`golang.org/x/sys`, `github.com/coregx/ahocorasick`).
## Quick Start
```go
package main
import (
"fmt"
"github.com/coregx/coregex"
)
func main() {
re := coregex.MustCompile(`\w+@\w+\.\w+`)
text := []byte("Contact support@example.com for help")
// Find first match
fmt.Printf("Found: %s\n", re.Find(text))
// Check if matches (zero allocation)
if re.MatchString("test@email.com") {
fmt.Println("Valid email format")
}
}
```
## Performance
Cross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/kolkov/regex-bench)):
| Pattern | Go stdlib | coregex | Rust regex | vs stdlib | vs Rust |
|---------|-----------|---------|------------|-----------|---------|
| Literal alternation | 554 ms | 4.5 ms | 0.72 ms | **122x** | 6.2x slower |
| Multi-literal | 1572 ms | 12.4 ms | 5.5 ms | **126x** | 2.2x slower |
| Inner `.*keyword.*` | 238 ms | 0.27 ms | 0.33 ms | **881x** | **1.2x faster** |
| Suffix `.*\.txt` | 239 ms | 1.9 ms | 1.2 ms | **125x** | 1.5x slower |
| Multiline `(?m)^/.*\.php` | 102 ms | 0.34 ms | 0.75 ms | **299x** | **2.2x faster** |
| Email validation | 257 ms | 0.46 ms | 0.31 ms | **557x** | 1.4x slower |
| URL extraction | 256 ms | 0.62 ms | 0.37 ms | **413x** | 1.6x slower |
| IP address | 494 ms | 0.72 ms | 13.5 ms | **685x** | **18.8x faster** |
| Version `\d+.\d+.\d+` | 164 ms | 0.62 ms | 0.79 ms | **263x** | **1.2x faster** |
| Char class `[\w]+` | 478 ms | 42.1 ms | 56.4 ms | **11x** | **1.3x faster** |
| Word repeat `(\w{2,8})+` | 690 ms | 180 ms | 54.7 ms | **3x** | 3.2x slower |
**Where coregex excels:**
- Multiline patterns (`(?m)^/.*\.php`) — **2.2x faster than Rust**, 299x vs stdlib
- IP/phone patterns (`\d+\.\d+\.\d+\.\d+`) — SIMD digit prefilter skips non-digit regions
- Suffix patterns (`.*\.log`, `.*\.txt`) — reverse search optimization (1000x+)
- Inner literals (`.*error.*`, `.*@example\.com`) — bidirectional DFA (900x+)
- Multi-pattern (`foo|bar|baz|...`) — Slim Teddy (≤32), Fat Teddy (33-64), or Aho-Corasick (>64)
- Anchored alternations (`^(\d+|UUID|hex32)`) — O(1) branch dispatch (5-20x)
- Concatenated char classes (`[a-zA-Z]+[0-9]+`) — DFA with byte classes (5-7x)
- **Zero-alloc iterators** (`AllIndex`, `AppendAllIndex`) — 0 heap allocs, up to **30% faster** than FindAll. Email pattern **faster than Rust** with `AppendAllIndex`.
## Features
### Engine Selection
coregex automatically selects the optimal engine:
| Strategy | Pattern Type | Speedup |
|----------|--------------|---------|
| **AnchoredLiteral** | `^prefix.*suffix$` | **32-133x** |
| **MultilineReverseSuffix** | `(?m)^/.*\.php` | **100-552x** ⚡ |
| ReverseInner | `.*keyword.*` | 100-900x |
| ReverseSuffix | `.*\.txt` | 100-1100x |
| BranchDispatch | `^(\d+\|UUID\|hex32)` | 5-20x |
| CompositeSequenceDFA | `[a-zA-Z]+[0-9]+` | 5-7x |
| LazyDFA | IP, complex patterns | 10-150x |
| AhoCorasick | `a\|b\|c\|...\|z` (>64 patterns) | 75-113x |
| CharClassSearcher | `[\w]+`, `\d+` | 4-25x |
| Slim Teddy | `foo\|bar\|baz` (2-32 patterns) | 15-240x |
| Fat Teddy | 33-64 patterns | 60-73x |
| OnePass | Anchored captures | 10x |
| BoundedBacktracker | Small patterns | 2-5x |
### API Compatibility
Drop-in replacement for `regexp.Regexp`:
```go
// stdlib
re := regexp.MustCompile(pattern)
// coregex — same API
re := coregex.MustCompile(pattern)
```
Supported methods:
- `Match`, `MatchString`, `MatchReader`
- `Find`, `FindString`, `FindAll`, `FindAllString`
- `FindIndex`, `FindStringIndex`, `FindAllIndex`
- `FindSubmatch`, `FindStringSubmatch`, `FindAllSubmatch`
- `ReplaceAll`, `ReplaceAllString`, `ReplaceAllFunc`
- `Split`, `SubexpNames`, `NumSubexp`
- `Longest`, `Copy`, `String`
### Zero-Allocation APIs
```go
// Zero allocations — boolean match
matched := re.IsMatch(text)
// Zero allocations — single match indices
start, end, found := re.FindIndices(text)
// Zero allocations — iterator over all matches (Go 1.23+)
for m := range re.AllIndex(data) {
fmt.Printf("match at [%d, %d]\n", m[0], m[1])
}
// Zero allocations — match content iterator
for s := range re.AllString(text) {
fmt.Println(s)
}
// Buffer-reuse — append to caller's slice (strconv.Append* pattern)
var buf [][2]int
for _, chunk := range chunks {
buf = re.AppendAllIndex(buf[:0], chunk, -1)
process(buf)
}
```
### Configuration
```go
config := coregex.DefaultConfig()
config.DFAMaxStates = 10000 // Limit DFA cache
config.EnablePrefilter = true // SIMD acceleration
re, err := coregex.CompileWithConfig(pattern, config)
```
### Thread Safety
A compiled `*Regexp` is safe for concurrent use by multiple goroutines:
```go
re := coregex.MustCompile(`\d+`)
// Safe: multiple goroutines sharing one compiled pattern
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
re.FindString("test 123 data") // thread-safe
}()
}
wg.Wait()
```
Internally uses `sync.Pool` (same pattern as Go stdlib `regexp`) for per-search state management.
## Syntax Support
Uses Go's `regexp/syntax` parser:
| Feature | Support |
|---------|---------|
| Character classes | `[a-z]`, `\d`, `\w`, `\s` |
| Quantifiers | `*`, `+`, `?`, `{n,m}` |
| Anchors | `^`, `$`, `\b`, `\B` |
| Groups | `(...)`, `(?:...)`, `(?P...)` |
| Unicode | `\p{L}`, `\P{N}` |
| Flags | `(?i)`, `(?m)`, `(?s)` |
| Backreferences | Not supported (O(n) guarantee) |
## Architecture
```
Pattern → Parse → NFA → Literal Extract → Strategy Select
↓
┌────────────────────────────────────────────┐
│ Engines (17 strategies): │
│ LazyDFA, PikeVM, OnePass, │
│ BoundedBacktracker, ReverseAnchored, │
│ ReverseInner, ReverseSuffix, │
│ ReverseSuffixSet, MultilineReverseSuffix, │
│ AnchoredLiteral, CharClassSearcher, │
│ Teddy, DigitPrefilter, AhoCorasick, │
│ CompositeSearcher, BranchDispatch, Both │
└────────────────────────────────────────────┘
↓
Input → Prefilter (SIMD) → Engine → Match Result
```
> For detailed architecture documentation, see [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
> For optimization details, see [docs/OPTIMIZATIONS.md](docs/OPTIMIZATIONS.md).
**SIMD Primitives** (AMD64):
- `memchr` — single byte search (AVX2)
- `memmem` — substring search (SSSE3)
- `Slim Teddy` — multi-pattern search, 2-32 patterns (SSSE3, 9+ GB/s)
- `Fat Teddy` — multi-pattern search, 33-64 patterns (AVX2, 9+ GB/s)
Pure Go fallback on other architectures.
## Battle-Tested
coregex was [tested in GoAWK](https://github.com/benhoyt/goawk/pull/264). This real-world testing uncovered 15+ edge cases that synthetic benchmarks missed.
### Powered by coregex: uawk
[uawk](https://github.com/kolkov/uawk) is a modern AWK interpreter built on coregex:
| Benchmark (10MB) | GoAWK | uawk | Speedup |
|------------------|-------|------|---------|
| Regex alternation | 1.85s | 97ms | **19x** |
| IP matching | 290ms | 99ms | **2.9x** |
| General regex | 320ms | 100ms | **3.2x** |
```bash
go install github.com/kolkov/uawk/cmd/uawk@latest
uawk '/error/ { print $0 }' server.log
```
**We need more testers!** If you have a project using `regexp`, try coregex and [report issues](https://github.com/coregx/coregex/issues).
## Documentation
- [API Reference](https://pkg.go.dev/github.com/coregx/coregex)
- [CHANGELOG](CHANGELOG.md)
- [Contributing](CONTRIBUTING.md)
- [Security Policy](SECURITY.md)
## Comparison
| | coregex | stdlib | regexp2 |
|---|---------|--------|---------|
| Performance | 3-3000x faster | Baseline | Slower |
| SIMD | AVX2/SSSE3 | No | No |
| O(n) guarantee | Yes | Yes | No |
| Backreferences | No | No | Yes |
| API | Drop-in | — | Different |
**Use coregex** for performance-critical code with O(n) guarantee.
**Use stdlib** for simple cases where performance doesn't matter.
**Use regexp2** if you need backreferences (accept exponential worst-case).
## Related
- [uawk](https://github.com/kolkov/uawk) — Ultra AWK interpreter powered by coregex
- [kolkov/regex-bench](https://github.com/kolkov/regex-bench) — Cross-language benchmarks
- [golang/go#26623](https://github.com/golang/go/issues/26623) — Go regexp performance discussion
- [golang/go#76818](https://github.com/golang/go/issues/76818) — Upstream path proposal
**Inspired by:**
- [Rust regex](https://github.com/rust-lang/regex) — Architecture
- [RE2](https://github.com/google/re2) — O(n) guarantees
- [Hyperscan](https://github.com/intel/hyperscan) — SIMD algorithms
## License
MIT — see [LICENSE](LICENSE).
---
**Status:** Pre-1.0 (API may change). Ready for testing and feedback.
[Releases](https://github.com/coregx/coregex/releases) · [Issues](https://github.com/coregx/coregex/issues) · [Discussions](https://github.com/coregx/coregex/discussions)
## Star History