{"id":31045912,"url":"https://github.com/auvred/regonaut","last_synced_at":"2026-04-14T18:34:09.058Z","repository":{"id":314563114,"uuid":"1042151307","full_name":"auvred/regonaut","owner":"auvred","description":"ES2025-compatible ECMAScript RegExp engine implemented in Go","archived":false,"fork":false,"pushed_at":"2025-09-13T07:17:56.000Z","size":213,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-13T09:33:28.361Z","etag":null,"topics":["ecmascript","go","javascript","js","regex","regex-engine","regexp","regular-expression","regular-expression-engine"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/auvred.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-21T15:02:19.000Z","updated_at":"2025-09-13T07:14:23.000Z","dependencies_parsed_at":"2025-09-13T09:33:32.355Z","dependency_job_id":"5cff982a-2005-4fd3-a59a-b0676f27cce8","html_url":"https://github.com/auvred/regonaut","commit_stats":null,"previous_names":["auvred/regonaut"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/auvred/regonaut","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auvred%2Fregonaut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auvred%2Fregonaut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auvred%2Fregonaut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auvred%2Fregonaut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/auvred","download_url":"https://codeload.github.com/auvred/regonaut/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auvred%2Fregonaut/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275143585,"owners_count":25413091,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ecmascript","go","javascript","js","regex","regex-engine","regexp","regular-expression","regular-expression-engine"],"created_at":"2025-09-14T17:47:33.451Z","updated_at":"2026-04-14T18:34:09.030Z","avatar_url":"https://github.com/auvred.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# regonaut\n\n**regonaut** is a Go implementation of [ECMAScript Regular Expressions](https://tc39.es/ecma262/2025/multipage/text-processing.html#sec-regexp-regular-expression-objects).\n\nIt aims to be _fully compatible with JavaScript's RegExp_, including all ES2025 features and the [Annex B legacy extensions](https://tc39.es/ecma262/2025/multipage/additional-ecmascript-features-for-web-browsers.html#sec-additional-ecmascript-features-for-web-browsers).\n\nCompatibility is verified against all [test262](https://github.com/tc39/test262) tests related to regular expressions.\n\nThat means a pattern that works in modern browsers or Node.js will behave the same way in Go.\n\nInternally, the engine uses a backtracking approach.\nSee Russ Cox's [blog post](https://swtch.com/~rsc/regexp/regexp1.html) for background on backtracking vs. other regexp implementations.\n\n## Installation\n\n```shell\ngo get github.com/auvred/regonaut\n```\n\n## Usage\n\n### TL;DR\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/auvred/regonaut\"\n)\n\nfunc main() {\n\tre := MustCompile(\".+(?\u003cfoo\u003ebAr)\", FlagIgnoreCase)\n\tm := re.FindMatch([]byte(\"_Bar_\"))\n\tfmt.Printf(\"Groups[0] - %q\\n\", m.Groups[0].Data())\n\tfmt.Printf(\"Groups[1] - %q\\n\", m.Groups[1].Data())\n\tfmt.Printf(\"NamedGroups[\\\"foo\\\"] - %q\\n\", m.NamedGroups[\"foo\"].Data())\n}\n```\n\n### Unicode handling\n\nECMAScript and Go have different models for representing strings, and that difference is central to how this library works.\n\nIn ECMAScript, strings are defined as sequences of UTF-16 code units, and they can be ill-formed.\nFor example, a string may contain a lone surrogate such as `\"\\uD800\"`, which is not a valid Unicode character on its own but is still considered a valid ECMAScript string.\nYou can read more about it [here](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters).\n\nRegular expressions in ECMAScript operate in two modes:\n\n- **Non-Unicode mode:** both the pattern and the input string are treated as raw sequences of [code units](https://en.wikipedia.org/wiki/Character_encoding#Code_unit).\n\n- **Unicode mode:** both the pattern and the input string are treated as sequences of [code points](https://en.wikipedia.org/wiki/Character_encoding#Code_point).\n\nUnicode mode is enabled when the `u` or `v` flag is provided.\n\nGo, on the other hand, uses UTF-8 encoded strings.\nBecause of this mismatch, the library provides two execution modes:\n\n#### UTF-8 mode (recommended)\n\n- Works with regular Go `string` values\n- Unicode awareness is always implied (the `u` flag is always enabled)\n- If you want features specific to the `v` flag, you must still explicitly enable it\n- Both the pattern and the input must be valid UTF-8 strings\n- They are processed as runes (each rune corresponds to a code point)\n- Capturing group indices are reported as byte offsets within the original UTF-8 string\n\n#### UTF-16 mode\n\n- Works with `[]uint16` slices\n- By default, each element of the slice is treated as a single code unit\n- When the `u` or `v` flag is used, valid surrogate pairs are combined into single code points, while lone surrogates remain as they are\n- **Use this mode only if you specifically need ECMAScript-style UTF-16 handling (e.g., when implementing or testing against a JavaScript engine)**\n\n#### Example\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/auvred/regonaut\"\n)\n\nfunc main() {\n\tvar pattern = \"c(.)(.)\"\n\tvar patternUtf16 = []uint16{'c', '(', '.', ')', '(', '.', ')'}\n\n\tvar source = []byte(\"c🐱at\")\n\tvar sourceUtf16 = []uint16{'c', 0xD83D, 0xDC31, 'a', 't'}\n\n\treUtf8 := regonaut.MustCompile(pattern, 0)\n\tm1 := reUtf8.FindMatch(source)\n\tfmt.Printf(\"UTF-8:                   %q, %q\\n\", m1.Groups[1].Data(), m1.Groups[2].Data())\n\n\treUtf8Unicode := regonaut.MustCompile(pattern, FlagUnicode)\n\tm2 := reUtf8Unicode.FindMatch(source)\n\tfmt.Printf(\"UTF-8 (with 'u' flag):   %q, %q\\n\", m2.Groups[1].Data(), m2.Groups[2].Data())\n\n\treUtf16 := regonaut.MustCompileUtf16(patternUtf16, 0)\n\tm3 := reUtf16.FindMatch(sourceUtf16)\n\tfmt.Printf(\"UTF-16:                  %#v, %#v\\n\", m3.Groups[1].Data(), m3.Groups[2].Data())\n\n\treUtf16Unicode := regonaut.MustCompileUtf16(patternUtf16, FlagUnicode)\n\tm4 := reUtf16Unicode.FindMatch(sourceUtf16)\n\tfmt.Printf(\"UTF-16 (with 'u' flag):  %#v, %#v\\n\", m4.Groups[1].Data(), m4.Groups[2].Data())\n}\n```\n\nOutputs:\n\n```plaintext\nUTF-8:                   \"🐱\", \"a\"\nUTF-8 (with 'u' flag):   \"🐱\", \"a\"\nUTF-16:                  []uint16{0xd83d}, []uint16{0xdc31}\nUTF-16 (with 'u' flag):  []uint16{0xd83d, 0xdc31}, []uint16{0x61}\n```\n\n| Mode   | Flags | Matching semantics                   | Group 1 (`m.Groups[1].Data()`) | Group 2 (`m.Groups[2].Data()`) |\n| ------ | ----- | ------------------------------------ | ------------------------------ | ------------------------------ |\n| UTF-8  | —     | Code points (UTF-8 mode implies `u`) | `\"🐱\"`                         | `\"a\"`                          |\n| UTF-8  | `u`   | Code points                          | `\"🐱\"`                         | `\"a\"`                          |\n| UTF-16 | —     | Code units (surrogates not paired)   | `[]uint16{0xd83d}`             | `[]uint16{0xdc31}`             |\n| UTF-16 | `u`   | Code points (surrogates paired)      | `[]uint16{0xd83d, 0xdc31}`     | `[]uint16{0x61}`               |\n\n\u003e [!NOTE]\n\u003e The [U+1F431 CAT FACE](https://codepoints.net/U+1F431) (🐱).\n\u003e In UTF-16 without `u`, it appears as two separate surrogate code units (`0xD83D`, `0xDC31`).\n\u003e With `u`, those are paired into one code point.\n\n## Local Development\n\n### Prerequisites\n\n- Go\n- Node.js with Type Stripping support (version 22.18.0+, 23.6.0+, or 24+)\n- pnpm\n\n### Setup\n\nMake sure the test262 submodule is initialized:\n\n```shell\ngit submodule update --init\n```\n\nGenerate the `test262` tests:\n\n```shell\ncd tools\npnpm i\npnpm run gen-test262-tests\ncd ..\n```\n\n### Running tests\n\n```shell\n# Run all tests, including test262\ngo test\n\n# Run all tests, except test262\ngo test -skip 262\n\n# Run all test, excluding generated property-escapes tests (they are slow)\ngo test -skip 262/built-ins/RegExp/property-escapes/generated\n```\n\n## License\n\n[MIT](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fauvred%2Fregonaut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fauvred%2Fregonaut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fauvred%2Fregonaut/lists"}