{"id":13596070,"url":"https://github.com/fancy-regex/fancy-regex","last_synced_at":"2026-04-05T17:37:17.308Z","repository":{"id":34829918,"uuid":"151298737","full_name":"fancy-regex/fancy-regex","owner":"fancy-regex","description":"Rust library for regular expressions using \"fancy\" features like look-around and backreferences","archived":false,"fork":false,"pushed_at":"2025-03-17T18:35:09.000Z","size":686,"stargazers_count":463,"open_issues_count":14,"forks_count":37,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-20T22:05:45.120Z","etag":null,"topics":["regex","regular-expressions"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fancy-regex.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["raphlinus","robinst"]}},"created_at":"2018-10-02T17:55:03.000Z","updated_at":"2025-03-19T08:05:21.000Z","dependencies_parsed_at":"2023-11-11T01:33:23.616Z","dependency_job_id":"af72b6ca-139c-457f-bb9e-6e1af4ec0cc6","html_url":"https://github.com/fancy-regex/fancy-regex","commit_stats":{"total_commits":323,"total_committers":28,"mean_commits":"11.535714285714286","dds":0.5944272445820433,"last_synced_commit":"c1b8a315239f07a2dc55dd1c5f0f22f64a5db03b"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fancy-regex%2Ffancy-regex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fancy-regex%2Ffancy-regex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fancy-regex%2Ffancy-regex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fancy-regex%2Ffancy-regex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fancy-regex","download_url":"https://codeload.github.com/fancy-regex/fancy-regex/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248049854,"owners_count":21039282,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["regex","regular-expressions"],"created_at":"2024-08-01T16:02:07.332Z","updated_at":"2025-12-12T16:55:44.066Z","avatar_url":"https://github.com/fancy-regex.png","language":"Rust","funding_links":["https://github.com/sponsors/raphlinus","https://github.com/sponsors/robinst"],"categories":["Rust","Libraries","库 Libraries","Text Processing"],"sub_categories":["Text processing","文本处理 Text processing"],"readme":"# fancy-regex\n\nA Rust library for compiling and matching regular expressions. It uses a hybrid\nregex implementation designed to support a relatively rich set of features.\nIn particular, it uses backtracking to implement \"fancy\" features such as\nlook-around and backtracking, which are not supported in purely\nNFA-based implementations (exemplified by\n[RE2](https://github.com/google/re2), and implemented in Rust in the\n[regex](https://crates.io/crates/regex) crate).\n\nTry it online in the **[fancy-regex playground](https://fancy-regex.github.io/fancy-regex/)** - test and explore regular expressions with advanced features in your browser.\n\n[![docs](https://docs.rs/fancy-regex/badge.svg)](https://docs.rs/fancy-regex)\n[![crate](https://img.shields.io/crates/v/fancy-regex.svg)](https://crates.io/crates/fancy-regex)\n[![ci](https://github.com/fancy-regex/fancy-regex/workflows/ci/badge.svg)](https://github.com/fancy-regex/fancy-regex/actions?query=workflow%3Aci)\n[![codecov](https://codecov.io/gh/fancy-regex/fancy-regex/branch/main/graph/badge.svg)](https://codecov.io/gh/fancy-regex/fancy-regex)\n\nA goal is to be as efficient as possible. For a given regex, the NFA\nimplementation has asymptotic running time linear in the length of the\ninput, while in the general case a backtracking implementation has\nexponential blowup. An example given in [Static Analysis for Regular\nExpression Exponential Runtime via Substructural\nLogics](https://arxiv.org/pdf/1405.7058.pdf) is:\n\n```python\nimport re\nre.compile('(a|b|ab)*bc').match('ab' * 28 + 'ac')\n```\n\nIn Python (tested on both 2.7 and 3.5), this match takes 91s, and\ndoubles for each additional repeat of 'ab'.\n\nThus, many proponents\n[advocate](https://swtch.com/~rsc/regexp/regexp1.html) a purely NFA\n(nondeterministic finite automaton) based approach. Even so,\nbackreferences and look-around do add richness to regexes, and they\nare commonly used in applications such as syntax highlighting for text\neditors. In particular, TextMate's [syntax\ndefinitions](https://manual.macromates.com/en/language_grammars),\nbased on the [Oniguruma](https://github.com/kkos/oniguruma)\nbacktracking engine, are now used in a number of other popular\neditors, including Sublime Text and Atom. These syntax definitions\nroutinely use backreferences and look-around. For example, the\nfollowing regex captures a single-line Rust raw string:\n\n```\nr(#*)\".*?\"\\1\n```\n\nThere is no NFA that can express this simple and useful pattern. Yet,\na backtracking implementation handles it efficiently.\n\nThis package is one of the first that handles both cases well. The\nexponential blowup case above is run in 258ns. Thus, it should be a\nvery appealing alternative for applications that require both richness\nand performance.\n\n## A warning about worst-case performance\n\nNFA-based approaches give strong guarantees about worst-case\nperformance. For regexes that contain \"fancy\" features such as\nbackreferences and look-around, this module gives no corresponding\nguarantee. If an attacker can control the regular expressions that\nwill be matched against, they will be able to successfully mount a\ndenial-of-service attack. Be warned.\n\nSee [PERFORMANCE.md](PERFORMANCE.md) for some examples.\n\n## A hybrid approach\n\nOne workable approach is to detect the presence of \"fancy\" features,\nand choose either an NFA implementation or a backtracker depending on\nwhether they are used.\n\nHowever, this module attempts to be more fine-grained. Instead, it\nimplements a true hybrid approach. In essence, it is a backtracking VM\n(as well explained in [Regular Expression Matching: the Virtual\nMachine Approach](https://swtch.com/~rsc/regexp/regexp2.html)) in\nwhich one of the \"instructions\" in the VM delegates to an inner NFA\nimplementation (in Rust, the regex crate, though a similar approach\nwould certainly be possible using RE2 or the Go\n[regexp](https://golang.org/pkg/regexp/) package). Then there's an\nanalysis which decides for each subexpression whether it is \"hard\", or\ncan be delegated to the NFA matcher. At the moment, it is eager, and\ndelegates as much as possible to the NFA engine.\n\n## Theory\n\nThe core concept behind this library is to implement a backtracking virtual machine (VM) for regular expression matching, similar to PCRE.\nHowever, whenever possible, this VM delegates work to an underlying regular expression engine - the Rust regex crate - which does not otherwise support \"fancy\" features like lookarounds and backreferences, but has other desirable design goals - specifically, the regex crate has runtime linear to input length.\n\nFor regular expressions that do not use \"fancy\" features, the library acts primarily as a lightweight wrapper around the underlying engine.\nWhen such features are present, the library performs an analysis to determine which parts of the expression must be handled by the backtracking engine and which can be safely delegated.\n\nThis analysis operates in two phases:\n\n### Phase 1 - Bottom-Up Analysis\n\nEach subexpression is analyzed to determine three key properties:\n\n- *hard*: Whether the subexpression requires backtracking features (backreferences, look-around, atomic groups, conditionals)\n- *minimum size*: The minimum number of characters this subexpression will match\n- *constant size*: Whether the subexpression always matches the same number of characters\n\n### Phase 2 - Top-Down Compilation\n\nThe compilation phase proceeds from the root of the expression, passing a \"hard context\" that flows from parent to child expressions. This context indicates whether match length variations will affect backtracking decisions.\n\n*Delegation Strategy*: If both the subexpression and context are \"easy\", the compiler generates a `Delegate` instruction to offload work to the high-performance NFA engine. Otherwise, it generates explicit VM instructions.\n\n*Concatenation Optimization*: For sequences of subexpressions, the compiler employs a sophisticated strategy:\n\n1. Identify a prefix of constant-size, easy subexpressions that can be safely delegated (because they won't affect backtracking)\n2. If the context is easy, identify a suffix of easy subexpressions for delegation\n3. Compile the remaining \"hard\" middle section with explicit backtracking instructions\n4. The hard context flows from right to left - only the rightmost hard subexpression gets an easy context\n\nThis ensures maximum delegation while preserving correct backtracking semantics.\n\n### Summary\n\nIn summary, the system efficiently combines backtracking and automaton-based matching by delegating as much work as possible to the underlying high-performance NFA engine, only resorting to backtracking where strictly necessary. This hybrid approach provides both expressive power and performance for advanced regular expression features.\n\n## Current status\n\nStill in development, though the basic ideas are in place. Currently,\nthe following features are missing:\n\n* Procedure calls and recursive expressions\n\n## Acknowledgements\n\nMany thanks to [Andrew Gallant](http://blog.burntsushi.net/about/) for\nstimulating conversations that inspired this approach, as well as for\ncreating the excellent regex crate.\n\n## Authors\n\nThe main author is Raph Levien, with many contributions from Robin Stocker\nand Keith Hall.\n\n## Contributions\n\nWe gladly accept contributions via GitHub pull requests. Please see\n[CONTRIBUTING.md](CONTRIBUTING.md) for more details.\n\nThis project started out as a Google 20% project, but none of the authors currently\nwork at Google so it has been forked to be community-maintained.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffancy-regex%2Ffancy-regex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffancy-regex%2Ffancy-regex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffancy-regex%2Ffancy-regex/lists"}