{"id":13583231,"url":"https://github.com/0no-co/reghex","last_synced_at":"2025-04-06T18:32:03.649Z","repository":{"id":55110200,"uuid":"262370823","full_name":"0no-co/reghex","owner":"0no-co","description":"The magical sticky regex-based parser generator 🧙","archived":false,"fork":false,"pushed_at":"2021-08-28T16:38:58.000Z","size":354,"stargazers_count":293,"open_issues_count":4,"forks_count":5,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-30T07:02:36.214Z","etag":null,"topics":["javascript","parser-generator","regex"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0no-co.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-08T16:17:56.000Z","updated_at":"2024-09-22T14:44:09.000Z","dependencies_parsed_at":"2022-08-14T12:20:21.055Z","dependency_job_id":null,"html_url":"https://github.com/0no-co/reghex","commit_stats":null,"previous_names":["kitten/reghex"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0no-co%2Freghex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0no-co%2Freghex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0no-co%2Freghex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0no-co%2Freghex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0no-co","download_url":"https://codeload.github.com/0no-co/reghex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353724,"owners_count":20925328,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","parser-generator","regex"],"created_at":"2024-08-01T15:03:20.464Z","updated_at":"2025-04-06T18:32:02.894Z","avatar_url":"https://github.com/0no-co.png","language":"JavaScript","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg alt=\"reghex\" width=\"250\" src=\"docs/reghex-logo.png\" /\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n  \u003cstrong\u003e\n    The magical sticky regex-based parser generator\n  \u003c/strong\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n\u003c/div\u003e\n\nLeveraging the power of sticky regexes and JS code generation, `reghex` allows\nyou to code parsers quickly, by surrounding regular expressions with a regex-like\n[DSL](https://en.wikipedia.org/wiki/Domain-specific_language).\n\nWith `reghex` you can generate a parser from a tagged template literal, which is\nquick to prototype and generates reasonably compact and performant code.\n\n_This project is still in its early stages and is experimental. Its API may still\nchange and some issues may need to be ironed out._\n\n## Quick Start\n\n##### 1. Install with yarn or npm\n\n```sh\nyarn add reghex\n# or\nnpm install --save reghex\n```\n\n##### 2. Add the plugin to your Babel configuration _(optional)_\n\nIn your `.babelrc`, `babel.config.js`, or `package.json:babel` add:\n\n```json\n{\n  \"plugins\": [\"reghex/babel\"]\n}\n```\n\nAlternatively, you can set up [`babel-plugin-macros`](https://github.com/kentcdodds/babel-plugin-macros) and\nimport `reghex` from `\"reghex/macro\"` instead.\n\nThis step is **optional**. `reghex` can also generate its optimised JS code during runtime.\nThis will only incur a tiny parsing cost on initialisation, but due to the JIT of modern\nJS engines there won't be any difference in performance between pre-compiled and compiled\nversions otherwise.\n\nSince the `reghex` runtime is rather small, for larger grammars it may even make sense not\nto precompile the matchers at all. For this case you may pass the `{ \"codegen\": false }`\noption to the Babel plugin, which will minify the `reghex` matcher templates without\nprecompiling them.\n\n##### 3. Have fun writing parsers!\n\n```js\nimport { match, parse } from 'reghex';\n\nconst name = match('name')`\n  ${/\\w+/}\n`;\n\nparse(name)('hello');\n// [ \"hello\", .tag = \"name\" ]\n```\n\n## Concepts\n\nThe fundamental concept of `reghex` are regexes, specifically\n[sticky regexes](https://www.loganfranken.com/blog/831/es6-everyday-sticky-regex-matches/)!\nThese are regular expressions that don't search a target string, but instead match at the\nspecific position they're at. The flag for sticky regexes is `y` and hence\nthey can be created using `/phrase/y` or `new RegExp('phrase', 'y')`.\n\n**Sticky Regexes** are the perfect foundation for a parsing framework in JavaScript!\nBecause they only match at a single position they can be used to match patterns\ncontinuously, as a parser would. Like global regexes, we can then manipulate where\nthey should be matched by setting `regex.lastIndex = index;` and after matching\nread back their updated `regex.lastIndex`.\n\n\u003e **Note:** Sticky Regexes aren't natively\n\u003e [supported in any versions of Internet Explorer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky#Browser_compatibility). `reghex` works around this by imitating its behaviour, which may decrease performance on IE11.\n\nThis primitive allows us to build up a parser from regexes that you pass when\nauthoring a parser function, also called a \"matcher\" in `reghex`. When `reghex` compiles\nto parser code, this code is just a sequence and combination of sticky regexes that\nare executed in order!\n\n```js\nlet input = 'phrases should be parsed...';\nlet lastIndex = 0;\n\nconst regex = /phrase/y;\nfunction matcher() {\n  let match;\n  // Before matching we set the current index on the RegExp\n  regex.lastIndex = lastIndex;\n  // Then we match and store the result\n  if ((match = regex.exec(input))) {\n    // If the RegExp matches successfully, we update our lastIndex\n    lastIndex = regex.lastIndex;\n  }\n}\n```\n\nThis mechanism is used in all matcher functions that `reghex` generates.\nInternally `reghex` keeps track of the input string and the current index on\nthat string, and the matcher functions execute regexes against this state.\n\n## Authoring Guide\n\nYou can write \"matchers\" by importing the `match` import from `reghex` and\nusing it to write a matcher expression.\n\n```js\nimport { match } from 'reghex';\n\nconst name = match('name')`\n  ${/\\w+/}\n`;\n```\n\nAs can be seen above, the `match` function, is called with a \"node name\" and\nis then called as a tagged template. This template is our **parsing definition**.\n\n`reghex` functions only with its Babel plugin, which will detect `match('name')`\nand replace the entire tag with a parsing function, which may then look like\nthe following in your transpiled code:\n\n```js\nimport { _pattern /* ... */ } from 'reghex';\n\nvar _name_expression = _pattern(/\\w+/);\nvar name = function name() {\n  /* ... */\n};\n```\n\nWe've now successfully created a matcher, which matches a single regex, which\nis a pattern of one or more letters. We can execute this matcher by calling\nit with the curried `parse` utility:\n\n```js\nimport { parse } from 'reghex';\n\nconst result = parse(name)('Tim');\n\nconsole.log(result); // [ \"Tim\", .tag = \"name\" ]\nconsole.log(result.tag); // \"name\"\n```\n\nIf the string (Here: \"Tim\") was parsed successfully by the matcher, it will\nreturn an array that contains the result of the regex. The array is special\nin that it will also have a `tag` property set to the matcher's name, here\n`\"name\"`, which we determined when we defined the matcher as `match('name')`.\n\n```js\nimport { parse } from 'reghex';\nparse(name)('42'); // undefined\n```\n\nSimilarly, if the matcher does not parse an input string successfully, it will\nreturn `undefined` instead.\n\n### Nested matchers\n\nThis on its own is nice, but a parser must be able to traverse a string and\nturn it into an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).\nTo introduce nesting to `reghex` matchers, we can refer to one matcher in another!\nLet's extend our original example;\n\n```js\nimport { match } from 'reghex';\n\nconst name = match('name')`\n  ${/\\w+/}\n`;\n\nconst hello = match('hello')`\n  ${/hello /} ${name}\n`;\n```\n\nThe new `hello` matcher is set to match `/hello /` and then attempts to match\nthe `name` matcher afterwards. If either of these matchers fail, it will return\n`undefined` as well and roll back its changes. Using this matcher will give us\n**nested abstract output**.\n\nWe can also see in this example that _outside_ of the regex interpolations,\nwhitespace and newlines don't matter.\n\n```js\nimport { parse } from 'reghex';\n\nparse(hello)('hello tim');\n/*\n  [\n    \"hello\",\n    [\"tim\", .tag = \"name\"],\n    .tag = \"hello\"\n  ]\n*/\n```\n\nFurthermore, interpolations don't have to just be RegHex matchers. They can\nalso be functions returning matchers or completely custom matching functions.\nThis is useful when your DSL becomes _self-referential_, i.e. when one matchers\nstart referencing each other forming a loop. To fix this we can create a\nfunction that returns our root matcher:\n\n```js\nimport { match } from 'reghex';\n\nconst value = match('value')`\n  (${/\\w+/} | ${() =\u003e root})+\n`;\n\nconst root = match('root')`\n  ${/root/}+ ${value}\n`;\n```\n\n### Regex-like DSL\n\nWe've seen in the previous examples that matchers are authored using tagged\ntemplate literals, where interpolations can either be filled using regexes,\n`${/pattern/}`, or with other matchers `${name}`.\n\nThe tagged template syntax supports more ways to match these interpolations,\nusing a regex-like Domain Specific Language. Unlike in regexes, whitespace\nand newlines don't matter, which makes it easier to format and read matchers.\n\nWe can create **sequences** of matchers by adding multiple expressions in\na row. A matcher using `${/1/} ${/2/}` will attempt to match `1` and then `2`\nin the parsed string. This is just one feature of the regex-like DSL. The\navailable operators are the following:\n\n| Operator | Example            | Description                                                                                                                                                                              |\n| -------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `?`      | `${/1/}?`          | An **optional** may be used to make an interpolation optional. This means that the interpolation may or may not match.                                                                   |\n| `*`      | `${/1/}*`          | A **star** can be used to match an arbitrary amount of interpolation or none at all. This means that the interpolation may repeat itself or may not be matched at all.                   |\n| `+`      | `${/1/}+`          | A **plus** is used like `*` and must match one or more times. When the matcher doesn't match, that's considered a failing case, since the match isn't optional.                          |\n| `\\|`     | `${/1/} \\| ${/2/}` | An **alternation** can be used to match either one thing or another, falling back when the first interpolation fails.                                                                    |\n| `()`     | `(${/1/} ${/2/})+` | A **group** can be used to apply one of the other operators to an entire group of interpolations.                                                                                        |\n| `(?: )`  | `(?: ${/1/})`      | A **non-capturing group** is like a regular group, but the interpolations matched inside it don't appear in the parser's output.                                                         |\n| `(?= )`  | `(?= ${/1/})`      | A **positive lookahead** checks whether interpolations match, and if so continues the matcher without changing the input. If it matches, it's essentially ignored.                       |\n| `(?! )`  | `(?! ${/1/})`      | A **negative lookahead** checks whether interpolations _don't_ match, and if so continues the matcher without changing the input. If the interpolations do match the matcher is aborted. |\n\nA couple of operators also support \"short hands\" that allow you to write\nlookaheads or non-capturing groups a little quicker.\n\n| Shorthand | Example   | Description                                                                                                                                                                              |\n| --------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `:`       | `:${/1/}` | A **non-capturing group** is like a regular group, but the interpolations matched inside it don't appear in the parser's output.                                                         |\n| `=`       | `=${/1/}` | A **positive lookahead** checks whether interpolations match, and if so continues the matcher without changing the input. If it matches, it's essentially ignored.                       |\n| `!`       | `!${/1/}` | A **negative lookahead** checks whether interpolations _don't_ match, and if so continues the matcher without changing the input. If the interpolations do match the matcher is aborted. |\n\nWe can combine and compose these operators to create more complex matchers.\nFor instance, we can extend the original example to only allow a specific set\nof names by using the `|` operator:\n\n```js\nconst name = match('name')`\n  ${/tim/} | ${/tom/} | ${/tam/}\n`;\n\nparse(name)('tim'); // [ \"tim\", .tag = \"name\" ]\nparse(name)('tom'); // [ \"tom\", .tag = \"name\" ]\nparse(name)('patrick'); // undefined\n```\n\nThe above will now only match specific name strings. When one pattern in this\nchain of **alternations** does not match, it will try the next one.\n\nWe can also use **groups** to add more matchers around the alternations themselves,\nby surrounding the alternations with `(` and `)`\n\n```js\nconst name = match('name')`\n  (${/tim/} | ${/tom/}) ${/!/}\n`;\n\nparse(name)('tim!'); // [ \"tim\", \"!\", .tag = \"name\" ]\nparse(name)('tom!'); // [ \"tom\", \"!\", .tag = \"name\" ]\nparse(name)('tim'); // undefined\n```\n\nMaybe we're also not that interested in the `\"!\"` showing up in the output node.\nIf we want to get rid of it, we can use a **non-capturing group** to hide it,\nwhile still requiring it.\n\n```js\nconst name = match('name')`\n  (${/tim/} | ${/tom/}) (?: ${/!/})\n`;\n\nparse(name)('tim!'); // [ \"tim\", .tag = \"name\" ]\nparse(name)('tim'); // undefined\n```\n\nLastly, like with regexes, `?`, `*`, and `+` may be used as \"quantifiers\". The first two\nmay also be optional and _not_ match their patterns without the matcher failing.\nThe `+` operator is used to match an interpolation _one or more_ times, while the\n`*` operators may match _zero or more_ times. Let's use this to allow the `\"!\"`\nto repeat.\n\n```js\nconst name = match('name')`\n  (${/tim/} | ${/tom/})+ (?: ${/!/})*\n`;\n\nparse(name)('tim!'); // [ \"tim\", .tag = \"name\" ]\nparse(name)('tim!!!!'); // [ \"tim\", .tag = \"name\" ]\nparse(name)('tim'); // [ \"tim\", .tag = \"name\" ]\nparse(name)('timtim'); // [ \"tim\", tim\", .tag = \"name\" ]\n```\n\nAs we can see from the above, like in regexes, quantifiers can be combined with groups,\nnon-capturing groups, or other groups.\n\n### Transforming as we match\n\nIn the previous sections, we've seen that the **nodes** that `reghex` outputs are arrays containing\nmatch strings or other nodes and have a special `tag` property with the node's type.\nWe can **change this output** while we're parsing by passing a function to our matcher definition.\n\n```js\nconst name = match('name', (x) =\u003e x[0])`\n  (${/tim/} | ${/tom/}) ${/!/}\n`;\n\nparse(name)('tim'); // \"tim\"\n```\n\nIn the above example, we're passing a small function, `x =\u003e x[0]` to the matcher as a\nsecond argument. This will change the matcher's output, which causes the parser to\nnow return a new output for this matcher.\n\nWe can use this function creatively by outputting full AST nodes, maybe even like the\nones that resemble Babel's output:\n\n```js\nconst identifier = match('identifier', (x) =\u003e ({\n  type: 'Identifier',\n  name: x[0],\n}))`\n  ${/[\\w_][\\w\\d_]+/}\n`;\n\nparse(name)('var_name'); // { type: \"Identifier\", name: \"var_name\" }\n```\n\nWe've now entirely changed the output of the parser for this matcher. Given that each\nmatcher can change its output, we're free to change the parser's output entirely.\nBy returning `null` or `undefined` in this matcher, we can also change the matcher\nto not have matched, which would cause other matchers to treat it like a mismatch!\n\n```js\nimport { match, parse } from 'reghex';\n\nconst name = match('name')((x) =\u003e {\n  return x[0] !== 'tim' ? x : undefined;\n})`\n  ${/\\w+/}\n`;\n\nconst hello = match('hello')`\n  ${/hello /} ${name}\n`;\n\nparse(name)('tom'); // [\"hello\", [\"tom\", .tag = \"name\"], .tag = \"hello\"]\nparse(name)('tim'); // undefined\n```\n\nLastly, if we need to create these special array nodes ourselves, we can use `reghex`'s\n`tag` export for this purpose.\n\n```js\nimport { tag } from 'reghex';\n\ntag(['test'], 'node_name');\n// [\"test\", .tag = \"node_name\"]\n```\n\n### Tagged Template Parsing\n\nAny grammar in RegHex can also be used to parse a tagged template literal.\nA tagged template literal consists of a list of literals alternating with\na list of \"interpolations\".\n\nIn RegHex we can add an `interpolation` matcher to our grammars to allow it\nto parse interpolations in a template literal.\n\n```js\nimport { interpolation } from 'reghex';\n\nconst anyNumber = interpolation((x) =\u003e typeof x === 'number');\n\nconst num = match('num')`\n  ${/[+-]?/} ${anyNumber}\n`;\n\nparse(num)`+${42}`;\n// [\"+\", 42, .tag = \"num\"]\n```\n\nThis grammar now allows us to match arbitrary values if they're input into the\nparser. We can now call our grammar using a tagged template literal themselves\nto parse this.\n\n**That's it! May the RegExp be ever in your favor.**\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0no-co%2Freghex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0no-co%2Freghex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0no-co%2Freghex/lists"}