{"id":35595880,"url":"https://github.com/coregx/coregex","last_synced_at":"2026-04-01T18:31:03.090Z","repository":{"id":326752760,"uuid":"1105180758","full_name":"coregx/coregex","owner":"coregx","description":"Pure Go production-grade regex engine with SIMD optimizations. Up to 3-3000x+ faster than stdlib.","archived":false,"fork":false,"pushed_at":"2026-03-25T08:15:06.000Z","size":1529,"stargazers_count":158,"open_issues_count":8,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-03-25T11:29:13.634Z","etag":null,"topics":["avx2","dfa","go","golang","multi-engine","nfa","performance","pikevm","regex","regex-engine","regexp","simd","ssse3"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/coregx/coregex","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coregx.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-27T08:49:38.000Z","updated_at":"2026-03-25T08:19:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"43583034-5e80-4548-b631-ca90a8a96f08","html_url":"https://github.com/coregx/coregex","commit_stats":null,"previous_names":["coregx/coregex"],"tags_count":85,"template":false,"template_full_name":null,"purl":"pkg:github/coregx/coregex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coregx%2Fcoregex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coregx%2Fcoregex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coregx%2Fcoregex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coregx%2Fcoregex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coregx","download_url":"https://codeload.github.com/coregx/coregex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coregx%2Fcoregex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290868,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx2","dfa","go","golang","multi-engine","nfa","performance","pikevm","regex","regex-engine","regexp","simd","ssse3"],"created_at":"2026-01-05T00:24:36.751Z","updated_at":"2026-04-01T18:31:03.084Z","avatar_url":"https://github.com/coregx.png","language":"Go","readme":"# coregex\n\n[![GitHub Release](https://img.shields.io/github/v/release/coregx/coregex?style=flat-square\u0026logo=github\u0026color=blue)](https://github.com/coregx/coregex/releases/latest)\n[![Go Version](https://img.shields.io/github/go-mod/go-version/coregx/coregex?style=flat-square\u0026logo=go)](https://go.dev/dl/)\n[![Go Reference](https://pkg.go.dev/badge/github.com/coregx/coregex.svg)](https://pkg.go.dev/github.com/coregx/coregex)\n[![CI](https://img.shields.io/github/actions/workflow/status/coregx/coregex/test.yml?branch=main\u0026style=flat-square\u0026logo=github-actions\u0026label=CI)](https://github.com/coregx/coregex/actions)\n[![Go Report Card](https://goreportcard.com/badge/github.com/coregx/coregex?style=flat-square)](https://goreportcard.com/report/github.com/coregx/coregex)\n[![codecov](https://codecov.io/gh/coregx/coregex/branch/main/graph/badge.svg)](https://codecov.io/gh/coregx/coregex)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square)](LICENSE)\n[![GitHub Stars](https://img.shields.io/github/stars/coregx/coregex?style=flat-square\u0026logo=github)](https://github.com/coregx/coregex/stargazers)\n[![GitHub Issues](https://img.shields.io/github/issues/coregx/coregex?style=flat-square\u0026logo=github)](https://github.com/coregx/coregex/issues)\n[![GitHub Discussions](https://img.shields.io/github/discussions/coregx/coregex?style=flat-square\u0026logo=github)](https://github.com/coregx/coregex/discussions)\n\nHigh-performance regex engine for Go. Drop-in replacement for `regexp` with **3-3000x speedup**.\\*\n\n\u003csub\u003e\\* Typical speedup 15-240x on real-world patterns. 1000x+ achieved on [specific edge cases](https://github.com/kolkov/regex-bench#extreme-speedups-1000-3000x) where prefilters skip entire input (e.g., IP pattern on text with no digits).\u003c/sub\u003e\n\n## Why coregex?\n\nGo's stdlib `regexp` is intentionally simple — single NFA engine, no optimizations. This guarantees O(n) time but leaves performance on the table.\n\ncoregex brings Rust regex-crate architecture to Go:\n- **Multi-engine**: 17 strategies — Lazy DFA, PikeVM, OnePass, BoundedBacktracker, and more\n- **SIMD prefilters**: AVX2/SSSE3 for fast candidate rejection\n- **Reverse search**: Suffix/inner literal patterns run 1000x+ faster\n- **O(n) guarantee**: No backtracking, no ReDoS vulnerabilities\n\n## Installation\n\n```bash\ngo get github.com/coregx/coregex\n```\n\nRequires Go 1.25+. Minimal dependencies (`golang.org/x/sys`, `github.com/coregx/ahocorasick`).\n\n## Quick Start\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"github.com/coregx/coregex\"\n)\n\nfunc main() {\n    re := coregex.MustCompile(`\\w+@\\w+\\.\\w+`)\n\n    text := []byte(\"Contact support@example.com for help\")\n\n    // Find first match\n    fmt.Printf(\"Found: %s\\n\", re.Find(text))\n\n    // Check if matches (zero allocation)\n    if re.MatchString(\"test@email.com\") {\n        fmt.Println(\"Valid email format\")\n    }\n}\n```\n\n## Performance\n\nCross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/kolkov/regex-bench)):\n\n| Pattern | Go stdlib | coregex | Rust regex | vs stdlib | vs Rust |\n|---------|-----------|---------|------------|-----------|---------|\n| Literal alternation | 554 ms | 4.5 ms | 0.72 ms | **122x** | 6.2x slower |\n| Multi-literal | 1572 ms | 12.4 ms | 5.5 ms | **126x** | 2.2x slower |\n| Inner `.*keyword.*` | 238 ms | 0.27 ms | 0.33 ms | **881x** | **1.2x faster** |\n| Suffix `.*\\.txt` | 239 ms | 1.9 ms | 1.2 ms | **125x** | 1.5x slower |\n| Multiline `(?m)^/.*\\.php` | 102 ms | 0.34 ms | 0.75 ms | **299x** | **2.2x faster** |\n| Email validation | 257 ms | 0.46 ms | 0.31 ms | **557x** | 1.4x slower |\n| URL extraction | 256 ms | 0.62 ms | 0.37 ms | **413x** | 1.6x slower |\n| IP address | 494 ms | 0.72 ms | 13.5 ms | **685x** | **18.8x faster** |\n| Version `\\d+.\\d+.\\d+` | 164 ms | 0.62 ms | 0.79 ms | **263x** | **1.2x faster** |\n| Char class `[\\w]+` | 478 ms | 42.1 ms | 56.4 ms | **11x** | **1.3x faster** |\n| Word repeat `(\\w{2,8})+` | 690 ms | 180 ms | 54.7 ms | **3x** | 3.2x slower |\n\n**Where coregex excels:**\n- Multiline patterns (`(?m)^/.*\\.php`) — **2.2x faster than Rust**, 299x vs stdlib\n- IP/phone patterns (`\\d+\\.\\d+\\.\\d+\\.\\d+`) — SIMD digit prefilter skips non-digit regions\n- Suffix patterns (`.*\\.log`, `.*\\.txt`) — reverse search optimization (1000x+)\n- Inner literals (`.*error.*`, `.*@example\\.com`) — bidirectional DFA (900x+)\n- Multi-pattern (`foo|bar|baz|...`) — Slim Teddy (≤32), Fat Teddy (33-64), or Aho-Corasick (\u003e64)\n- Anchored alternations (`^(\\d+|UUID|hex32)`) — O(1) branch dispatch (5-20x)\n- Concatenated char classes (`[a-zA-Z]+[0-9]+`) — DFA with byte classes (5-7x)\n- **Zero-alloc iterators** (`AllIndex`, `AppendAllIndex`) — 0 heap allocs, up to **30% faster** than FindAll. Email pattern **faster than Rust** with `AppendAllIndex`.\n\n## Features\n\n### Engine Selection\n\ncoregex automatically selects the optimal engine:\n\n| Strategy | Pattern Type | Speedup |\n|----------|--------------|---------|\n| **AnchoredLiteral** | `^prefix.*suffix$` | **32-133x** |\n| **MultilineReverseSuffix** | `(?m)^/.*\\.php` | **100-552x** ⚡ |\n| ReverseInner | `.*keyword.*` | 100-900x |\n| ReverseSuffix | `.*\\.txt` | 100-1100x |\n| BranchDispatch | `^(\\d+\\|UUID\\|hex32)` | 5-20x |\n| CompositeSequenceDFA | `[a-zA-Z]+[0-9]+` | 5-7x |\n| LazyDFA | IP, complex patterns | 10-150x |\n| AhoCorasick | `a\\|b\\|c\\|...\\|z` (\u003e64 patterns) | 75-113x |\n| CharClassSearcher | `[\\w]+`, `\\d+` | 4-25x |\n| Slim Teddy | `foo\\|bar\\|baz` (2-32 patterns) | 15-240x |\n| Fat Teddy | 33-64 patterns | 60-73x |\n| OnePass | Anchored captures | 10x |\n| BoundedBacktracker | Small patterns | 2-5x |\n\n### API Compatibility\n\nDrop-in replacement for `regexp.Regexp`:\n\n```go\n// stdlib\nre := regexp.MustCompile(pattern)\n\n// coregex — same API\nre := coregex.MustCompile(pattern)\n```\n\nSupported methods:\n- `Match`, `MatchString`, `MatchReader`\n- `Find`, `FindString`, `FindAll`, `FindAllString`\n- `FindIndex`, `FindStringIndex`, `FindAllIndex`\n- `FindSubmatch`, `FindStringSubmatch`, `FindAllSubmatch`\n- `ReplaceAll`, `ReplaceAllString`, `ReplaceAllFunc`\n- `Split`, `SubexpNames`, `NumSubexp`\n- `Longest`, `Copy`, `String`\n\n### Zero-Allocation APIs\n\n```go\n// Zero allocations — boolean match\nmatched := re.IsMatch(text)\n\n// Zero allocations — single match indices\nstart, end, found := re.FindIndices(text)\n\n// Zero allocations — iterator over all matches (Go 1.23+)\nfor m := range re.AllIndex(data) {\n    fmt.Printf(\"match at [%d, %d]\\n\", m[0], m[1])\n}\n\n// Zero allocations — match content iterator\nfor s := range re.AllString(text) {\n    fmt.Println(s)\n}\n\n// Buffer-reuse — append to caller's slice (strconv.Append* pattern)\nvar buf [][2]int\nfor _, chunk := range chunks {\n    buf = re.AppendAllIndex(buf[:0], chunk, -1)\n    process(buf)\n}\n```\n\n### Configuration\n\n```go\nconfig := coregex.DefaultConfig()\nconfig.DFAMaxStates = 10000      // Limit DFA cache\nconfig.EnablePrefilter = true    // SIMD acceleration\n\nre, err := coregex.CompileWithConfig(pattern, config)\n```\n\n### Thread Safety\n\nA compiled `*Regexp` is safe for concurrent use by multiple goroutines:\n\n```go\nre := coregex.MustCompile(`\\d+`)\n\n// Safe: multiple goroutines sharing one compiled pattern\nvar wg sync.WaitGroup\nfor i := 0; i \u003c 100; i++ {\n    wg.Add(1)\n    go func() {\n        defer wg.Done()\n        re.FindString(\"test 123 data\")  // thread-safe\n    }()\n}\nwg.Wait()\n```\n\nInternally uses `sync.Pool` (same pattern as Go stdlib `regexp`) for per-search state management.\n\n## Syntax Support\n\nUses Go's `regexp/syntax` parser:\n\n| Feature | Support |\n|---------|---------|\n| Character classes | `[a-z]`, `\\d`, `\\w`, `\\s` |\n| Quantifiers | `*`, `+`, `?`, `{n,m}` |\n| Anchors | `^`, `$`, `\\b`, `\\B` |\n| Groups | `(...)`, `(?:...)`, `(?P\u003cname\u003e...)` |\n| Unicode | `\\p{L}`, `\\P{N}` |\n| Flags | `(?i)`, `(?m)`, `(?s)` |\n| Backreferences | Not supported (O(n) guarantee) |\n\n## Architecture\n\n```\nPattern → Parse → NFA → Literal Extract → Strategy Select\n                                               ↓\n                  ┌────────────────────────────────────────────┐\n                  │ Engines (17 strategies):                   │\n                  │  LazyDFA, PikeVM, OnePass,                 │\n                  │  BoundedBacktracker, ReverseAnchored,      │\n                  │  ReverseInner, ReverseSuffix,              │\n                  │  ReverseSuffixSet, MultilineReverseSuffix, │\n                  │  AnchoredLiteral, CharClassSearcher,       │\n                  │  Teddy, DigitPrefilter, AhoCorasick,       │\n                  │  CompositeSearcher, BranchDispatch, Both   │\n                  └────────────────────────────────────────────┘\n                                               ↓\nInput → Prefilter (SIMD) → Engine → Match Result\n```\n\n\u003e For detailed architecture documentation, see [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).\n\u003e For optimization details, see [docs/OPTIMIZATIONS.md](docs/OPTIMIZATIONS.md).\n\n**SIMD Primitives** (AMD64):\n- `memchr` — single byte search (AVX2)\n- `memmem` — substring search (SSSE3)\n- `Slim Teddy` — multi-pattern search, 2-32 patterns (SSSE3, 9+ GB/s)\n- `Fat Teddy` — multi-pattern search, 33-64 patterns (AVX2, 9+ GB/s)\n\nPure Go fallback on other architectures.\n\n## Battle-Tested\n\ncoregex was [tested in GoAWK](https://github.com/benhoyt/goawk/pull/264). This real-world testing uncovered 15+ edge cases that synthetic benchmarks missed.\n\n### Powered by coregex: uawk\n\n[uawk](https://github.com/kolkov/uawk) is a modern AWK interpreter built on coregex:\n\n| Benchmark (10MB) | GoAWK | uawk | Speedup |\n|------------------|-------|------|---------|\n| Regex alternation | 1.85s | 97ms | **19x** |\n| IP matching | 290ms | 99ms | **2.9x** |\n| General regex | 320ms | 100ms | **3.2x** |\n\n```bash\ngo install github.com/kolkov/uawk/cmd/uawk@latest\nuawk '/error/ { print $0 }' server.log\n```\n\n**We need more testers!** If you have a project using `regexp`, try coregex and [report issues](https://github.com/coregx/coregex/issues).\n\n## Documentation\n\n- [API Reference](https://pkg.go.dev/github.com/coregx/coregex)\n- [CHANGELOG](CHANGELOG.md)\n- [Contributing](CONTRIBUTING.md)\n- [Security Policy](SECURITY.md)\n\n## Comparison\n\n| | coregex | stdlib | regexp2 |\n|---|---------|--------|---------|\n| Performance | 3-3000x faster | Baseline | Slower |\n| SIMD | AVX2/SSSE3 | No | No |\n| O(n) guarantee | Yes | Yes | No |\n| Backreferences | No | No | Yes |\n| API | Drop-in | — | Different |\n\n**Use coregex** for performance-critical code with O(n) guarantee.\n**Use stdlib** for simple cases where performance doesn't matter.\n**Use regexp2** if you need backreferences (accept exponential worst-case).\n\n## Related\n\n- [uawk](https://github.com/kolkov/uawk) — Ultra AWK interpreter powered by coregex\n- [kolkov/regex-bench](https://github.com/kolkov/regex-bench) — Cross-language benchmarks\n- [golang/go#26623](https://github.com/golang/go/issues/26623) — Go regexp performance discussion\n- [golang/go#76818](https://github.com/golang/go/issues/76818) — Upstream path proposal\n\n**Inspired by:**\n- [Rust regex](https://github.com/rust-lang/regex) — Architecture\n- [RE2](https://github.com/google/re2) — O(n) guarantees\n- [Hyperscan](https://github.com/intel/hyperscan) — SIMD algorithms\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n**Status:** Pre-1.0 (API may change). Ready for testing and feedback.\n\n[Releases](https://github.com/coregx/coregex/releases) · [Issues](https://github.com/coregx/coregex/issues) · [Discussions](https://github.com/coregx/coregex/discussions)\n\n## Star History\n\n\u003ca href=\"https://star-history.com/#coregx/coregex\u0026Date\"\u003e\n \u003cpicture\u003e\n   \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=coregx/coregex\u0026type=Date\u0026theme=dark\" /\u003e\n   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=coregx/coregex\u0026type=Date\" /\u003e\n   \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/svg?repos=coregx/coregex\u0026type=Date\" /\u003e\n \u003c/picture\u003e\n\u003c/a\u003e\n","funding_links":[],"categories":["Template Engines","Text Processing"],"sub_categories":["Regular Expressions"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoregx%2Fcoregex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoregx%2Fcoregex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoregx%2Fcoregex/lists"}