{"id":13735584,"url":"https://github.com/zevv/npeg","last_synced_at":"2025-04-04T15:14:27.066Z","repository":{"id":50261413,"uuid":"174614366","full_name":"zevv/npeg","owner":"zevv","description":"PEGs for Nim, another take","archived":false,"fork":false,"pushed_at":"2024-08-22T14:51:49.000Z","size":10178,"stargazers_count":334,"open_issues_count":4,"forks_count":22,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-04-03T13:49:38.969Z","etag":null,"topics":["gerexp","grammar","nim","parser","parser-generator","peg","regex","regular-expressions"],"latest_commit_sha":null,"homepage":"","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zevv.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-08T21:45:14.000Z","updated_at":"2025-03-11T11:51:46.000Z","dependencies_parsed_at":"2023-02-13T22:45:28.448Z","dependency_job_id":"e5dafeca-2726-4c48-9e92-08cce1d63be3","html_url":"https://github.com/zevv/npeg","commit_stats":null,"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fnpeg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fnpeg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fnpeg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fnpeg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zevv","download_url":"https://codeload.github.com/zevv/npeg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247198466,"owners_count":20900081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gerexp","grammar","nim","parser","parser-generator","peg","regex","regular-expressions"],"created_at":"2024-08-03T03:01:08.508Z","updated_at":"2025-04-04T15:14:27.047Z","avatar_url":"https://github.com/zevv.png","language":"Nim","funding_links":[],"categories":["Language Features"],"sub_categories":["Pattern Matching"],"readme":"[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n![Stability: experimental](https://img.shields.io/badge/stability-stable-green.svg)\n\n\u003cimg src=\"https://raw.githubusercontent.com/zevv/npeg/master/doc/npeg.png\" alt=\"NPeg logo\" align=\"left\"\u003e\n\n\u003e \"_Because friends don't let friends write parsers by hand_\"\n\nNPeg is a pure Nim pattern matching library. It provides macros to compile\npatterns and grammars (PEGs) to Nim procedures which will parse a string and\ncollect selected parts of the input. PEGs are not unlike regular expressions,\nbut offer more power and flexibility, and have less ambiguities. (More about \nPEGs on [Wikipedia](https://en.wikipedia.org/wiki/Parsing_expression_grammar))\n\n![Graph](/doc/syntax-diagram.png)\n\nSome use cases where NPeg is useful are configuration or data file parsers,\nrobust protocol implementations, input validation, lexing of programming\nlanguages or domain specific languages.\n\nSome NPeg highlights:\n\n- Grammar definitions and Nim code can be freely mixed. Nim code is embedded\n  using the normal Nim code block syntax, and does not disrupt the grammar\n  definition.\n\n- NPeg-generated parsers can be used both at run and at compile time.\n\n- NPeg offers various methods for tracing, optimizing and debugging\n  your parsers.\n\n- NPeg can parse sequences of any data types, also making it suitable as a\n  stage-two parser for lexed tokens.\n\n- NPeg can draw [cool diagrams](/doc/example-railroad.png)\n\n## Contents\n\n\u003c!-- AutoContentStart --\u003e\n- [Quickstart](#quickstart)\n- [Usage](#usage)\n    * [Simple patterns](#simple-patterns)\n    * [Grammars](#grammars)\n- [Syntax](#syntax)\n    * [Atoms](#atoms)\n    * [Operators](#operators)\n- [Precedence operators](#precedence-operators)\n- [Captures](#captures)\n    * [String captures](#string-captures)\n    * [Code block captures](#code-block-captures)\n        - [Custom match validations](#custom-match-validations)\n        - [Passing state](#passing-state)\n    * [Backreferences](#backreferences)\n- [More about grammars](#more-about-grammars)\n    * [Ordering of rules in a grammar](#ordering-of-rules-in-a-grammar)\n    * [Templates, or parameterized rules](#templates-or-parameterized-rules)\n    * [Composing grammars with libraries](#composing-grammars-with-libraries)\n    * [Library rule overriding/shadowing](#library-rule-overridingshadowing)\n- [Error handling](#error-handling)\n    * [MatchResult](#matchresult)\n    * [NpegParseError exceptions](#npegparseerror-exceptions)\n    * [Other exceptions](#other-exceptions)\n    * [Parser stack trace](#parser-stack-trace)\n- [Advanced topics](#advanced-topics)\n    * [Parsing other types then strings](#parsing-other-types-then-strings)\n- [Some notes on using PEGs](#some-notes-on-using-pegs)\n    * [Anchoring and searching](#anchoring-and-searching)\n    * [Complexity and performance](#complexity-and-performance)\n    * [End of string](#end-of-string)\n    * [Non-consuming atoms and captures](#non-consuming-atoms-and-captures)\n    * [Left recursion](#left-recursion)\n    * [UTF-8 / Unicode](#utf-8--unicode)\n- [Tracing and debugging](#tracing-and-debugging)\n    * [Syntax diagrams](#syntax-diagrams)\n    * [Grammar graphs](#grammar-graphs)\n    * [Tracing](#tracing)\n- [Compile-time configuration](#compile-time-configuration)\n- [Tracing and debugging](#tracing-and-debugging-1)\n- [Random stuff and frequently asked questions](#random-stuff-and-frequently-asked-questions)\n    * [Why does NPeg not support regular PEG syntax?](#why-does-npeg-not-support-regular-peg-syntax)\n    * [Can NPeg be used to parse EBNF grammars?](#can-npeg-be-used-to-parse-ebnf-grammars)\n    * [NPeg and generic functions](#npeg-and-generic-functions)\n- [Examples](#examples)\n    * [Parsing arithmetic expressions](#parsing-arithmetic-expressions)\n    * [A complete JSON parser](#a-complete-json-parser)\n    * [Captures](#captures-1)\n    * [More examples](#more-examples)\n- [Future directions / Todos / Roadmap / The long run](#future-directions--todos--roadmap--the-long-run)\n\n\u003c!-- AutoContentEnd --\u003e\n\n## Quickstart\n\nHere is a simple example showing the power of NPeg: The macro `peg` compiles a\ngrammar definition into a `parser` object, which is used to match a string and\nplace the key-value pairs into the Nim table `words`:\n\n```nim\nimport npeg, strutils, tables\n\ntype Dict = Table[string, int]\n\nlet parser = peg(\"pairs\", d: Dict):\n  pairs \u003c- pair * *(',' * pair) * !1\n  word \u003c- +Alpha\n  number \u003c- +Digit\n  pair \u003c- \u003eword * '=' * \u003enumber:\n    d[$1] = parseInt($2)\n\nvar words: Dict\ndoAssert parser.match(\"one=1,two=2,three=3,four=4\", words).ok\necho words\n```\n\nOutput:\n\n```nim\n{\"two\": 2, \"three\": 3, \"one\": 1, \"four\": 4}\n```\n\nA brief explanation of the above code:\n\n* The macro `peg` is used to create a parser object, which uses `pairs` as the\n  initial grammar rule to match. The variable `d` of type `Dict` will be available\n  inside the code block parser for storing the parsed data.\n\n* The rule `pairs` matches one `pair`, followed by zero or more times (`*`) a\n  comma followed by a `pair`.\n\n* The rules `word` and `number` match a sequence of one or more (`+`)\n  alphabetic characters or digits, respectively. The `Alpha` and `Digit` rules\n  are pre-defined rules matching the character classes `{'A'..'Z','a'..'z'}` \n  and `{'0'..'9'}`.\n\n* The rule `pair` matches a `word`, followed by an equals sign (`=`), followed\n  by a `number`.\n\n* The `word` and `number` in the `pair` rule are captured with the `\u003e`\n  operator. The Nim code fragment below this rule is executed for every match,\n  and stores the captured word and number in the `words` Nim table.\n\n\n## Usage\n\nThe `patt()` and `peg()` macros can be used to compile parser functions:\n\n- `patt()` creates a parser from a single anonymous pattern.\n\n- `peg()` allows the definition of a set of (potentially recursive) rules \n          making up a complete grammar.\n\nThe result of these macros is an object of the type `Parser` which can be used\nto parse a subject:\n\n```nim\nproc match(p: Parser, s: string) = MatchResult\nproc matchFile(p: Parser, fname: string) = MatchResult\n```\n\nThe above `match` functions returns an object of the type `MatchResult`:\n\n```nim\nMatchResult = object\n  ok: bool\n  matchLen: int\n  matchMax: int\n  ...\n```\n\n* `ok`: A boolean indicating if the matching succeeded without error. Note that\n  a successful match does not imply that *all of the subject* was matched,\n  unless the pattern explicitly matches the end-of-string.\n\n* `matchLen`: The number of input bytes of the subject that successfully\n  matched.\n\n* `matchMax`: The highest index into the subject that was reached during\n  parsing, *even if matching was backtracked or did not succeed*. This offset\n  is usually a good indication of the location where the matching error\n  occurred.\n\nThe string captures made during the parsing can be accessed with:\n\n```nim\nproc captures(m: MatchResult): seq[string]\n```\n\n\n### Simple patterns\n\nA simple pattern can be compiled with the `patt` macro.\n\nFor example, the pattern below splits a string by white space:\n\n```nim\nlet parser = patt *(*' ' * \u003e +(1-' '))\necho parser.match(\"   one two three \").captures\n```\n\nOutput:\n\n```\n@[\"one\", \"two\", \"three\"]\n```\n\nThe `patt` macro can take an optional code block which is used as code block\ncapture for the pattern:\n\n```nim\nvar key, val: string\nlet p = patt \u003e+Digit * \"=\" * \u003e+Alpha:\n  (key, val) = ($1, $2)\n\nassert p.match(\"15=fifteen\").ok\necho key, \" = \", val\n```\n\n### Grammars\n\nThe `peg` macro provides a method to define (recursive) grammars. The first\nargument is the name of initial patterns, followed by a list of named patterns.\nPatterns can now refer to other patterns by name, allowing for recursion:\n\n```nim\nlet parser = peg \"ident\":\n  lower \u003c- {'a'..'z'}\n  ident \u003c- *lower\ndoAssert parser.match(\"lowercaseword\").ok\n```\n\nThe order in which the grammar patterns are defined affects the generated\nparser.\nAlthough NPeg could always reorder, this is a design choice to give the user\nmore control over the generated parser:\n\n* when a pattern `P1` refers to pattern `P2` which is defined *before* `P1`,\n  `P2` will be inlined in `P1`.  This increases the generated code size, but\n  generally improves performance.\n\n* when a pattern `P1` refers to pattern `P2` which is defined *after* `P1`,\n  `P2` will be generated as a subroutine which gets called from `P1`. This will\n  reduce code size, but might also result in a slower parser.\n\n\n## Syntax\n\nThe NPeg syntax is similar to normal PEG notation, but some changes were made\nto allow the grammar to be properly parsed by the Nim compiler:\n\n- NPeg uses prefixes instead of suffixes for `*`, `+`, `-` and `?`.\n- Ordered choice uses `|` instead of `/` because of operator precedence.\n- The explicit `*` infix operator is used for sequences.\n\nNPeg patterns and grammars can be composed from the following parts:\n\n```nim\n\nAtoms:\n\n   0              # matches always and consumes nothing\n   1              # matches any character\n   n              # matches exactly n characters\n  'x'             # matches literal character 'x'\n  \"xyz\"           # matches literal string \"xyz\"\n i\"xyz\"           # matches literal string, case insensitive\n  {'x'..'y'}      # matches any character in the range from 'x'..'y'\n  {'x','y','z'}   # matches any character from the set\n\nOperators:\n\n   P1 * P2        # concatenation\n   P1 | P2        # ordered choice\n   P1 - P2        # matches P1 if P2 does not match\n  (P)             # grouping\n  !P              # matches everything but P\n  \u0026P              # matches P without consuming input\n  ?P              # matches P zero or one times\n  *P              # matches P zero or more times\n  +P              # matches P one or more times\n  @P              # search for P\n   P[n]           # matches P n times\n   P[m..n]        # matches P m to n times\n\nPrecedence operators:\n\n  P ^ N           # P is left associative with precedence N\n  P ^^ N          # P is right associative with precedence N\n\nString captures:  \n\n  \u003eP              # Captures the string matching  P \n\nBack references:\n\n  R(\"tag\", P)     # Create a named reference for pattern P\n  R(\"tag\")        # Matches the given named reference\n\nError handling:\n\n  E\"msg\"          # Raise an `NPegParseError` exception\n```\n\nIn addition to the above, NPeg provides the following built-in shortcuts for\ncommon atoms, corresponding to POSIX character classes:\n\n```nim\n  Alnum  \u003c- {'A'..'Z','a'..'z','0'..'9'}, # Alphanumeric characters\n  Alpha  \u003c- {'A'..'Z','a'..'z'},          # Alphabetic characters\n  Blank  \u003c- {' ','\\t'},                   # Space and tab\n  Cntrl  \u003c- {'\\x00'..'\\x1f','\\x7f'},      # Control characters\n  Digit  \u003c- {'0'..'9'},                   # Digits\n  Graph  \u003c- {'\\x21'..'\\x7e'},             # Visible characters\n  Lower  \u003c- {'a'..'z'},                   # Lowercase characters\n  Print  \u003c- {'\\x21'..'\\x7e',' '},         # Visible characters and spaces\n  Space  \u003c- {'\\9'..'\\13',' '},            # Whitespace characters\n  Upper  \u003c- {'A'..'Z'},                   # Uppercase characters\n  Xdigit \u003c- {'A'..'F','a'..'f','0'..'9'}, # Hexadecimal digits\n```\n\n\n### Atoms\n\nAtoms are the basic building blocks for a grammar, describing the parts of the\nsubject that should be matched.\n\n- Integer literal: `0` / `1` / `n`\n\n  The int literal atom `n` matches exactly n number of bytes. `0` always\n  matches, but does not consume any data.\n\n\n- Character and string literals: `'x'` / `\"xyz\"` / `i\"xyz\"`\n\n  Characters and strings are literally matched. If a string is prefixed with\n  `i`, it will be matched case insensitive.\n\n\n- Character sets: `{'x','y'}`\n\n  Characters set notation is similar to native Nim. A set consists of zero or\n  more comma separated characters or character ranges.\n\n  ```nim\n   {'x'..'y'}    # matches any character in the range from 'x'..'y'\n   {'x','y','z'} # matches any character from the set 'x', 'y', and 'z'\n  ```\n\n  The set syntax `{}` is flexible and can take multiple ranges and characters\n  in one expression, for example `{'0'..'9','a'..'f','A'..'F'}`.\n\n\n### Operators\n\nNPeg provides various prefix and infix operators. These operators combine or\ntransform one or more patterns into expressions, building larger patterns.\n\n- Concatenation: `P1 * P2`\n\n  ```\n  o──[P1]───[P2]──o\n  ```\n\n  The pattern `P1 * P2` returns a new pattern that matches only if first `P1`\n  matches, followed by `P2`.\n\n  For example, `\"foo\" * \"bar\"` would only match the string `\"foobar\"`.\n\n  Note: As an alternative for the `*` asterisk, the unicode glyph `∙` (\"bullet\n  operator\", 0x2219) can also be used for concatenation.\n\n\n- Ordered choice: `P1 | P2`\n\n  ```\n  o─┬─[P1]─┬─o\n    ╰─[P2]─╯\n  ```\n\n  The pattern `P1 | P2` tries to first match pattern `P1`. If this succeeds,\n  matching will proceed without trying `P2`. Only if `P1` can not be matched,\n  NPeg will backtrack and try to match `P2` instead. Once either `P1` or `P2` has\n  matched, the choice will be final (\"commited\"), and no more backtracking will\n  be possible for this choice.\n\n  For example `(\"foo\" | \"bar\") * \"fizz\"` would match both `\"foofizz\"` and\n  `\"barfizz\"`.\n\n  NPeg optimizes the `|` operator for characters and character sets: The\n  pattern `'a' | 'b' | 'c'` will be rewritten to a character set\n  `{'a','b','c'}`.\n\n\n- Difference: `P1 - P2`\n\n  The pattern `P1 - P2` matches `P1` *only* if `P2` does not match. This is\n  equivalent to `!P2 * P1`:\n  \n  ```\n     ━━━━\n  o──[P2]─»─[P1]──o\n  ```\n\n  NPeg optimizes the `-` operator for characters and character sets: The\n  pattern `{'a','b','c'} - 'b'` will be rewritten to the character set\n  `{'a','c'}`.\n\n\n- Grouping: `(P)`\n\n  Brackets are used to group patterns similar to normal arithmetic expressions.\n\n\n- Not-predicate: `!P`\n\n  ```\n     ━━━\n  o──[P]──o\n  ```\n\n  The pattern `!P` returns a pattern that matches only if the input does not\n  match `P`.\n  In contrast to most other patterns, this pattern does not consume any input.\n\n  A common usage for this operator is the pattern `!1`, meaning \"only succeed\n  if there is not a single character left to match\" - which is only true for\n  the end of the string.\n\n\n- And-predicate: `\u0026P`\n\n  ```\n     ━━━\n     ━━━\n  o──[P]──o\n  ```\n\n  The pattern `\u0026P` matches only if the input matches `P`, but will *not*\n  consume any input. This is equivalent to `!!P`. This is denoted by a double\n  negation in the railroad diagram, which is not very pretty unfortunately.\n\n- Optional: `?P`\n\n  ```\n    ╭──»──╮\n  o─┴─[P]─┴─o\n  ```\n\n  The pattern `?P` matches if `P` can be matched zero or more times, so\n  essentially succeeds if `P` either matches or not.\n\n  For example, `?\"foo\" * bar\"` matches both `\"foobar\"` and `\"bar\"`.\n\n\n- Match zero or more times: `*P`\n\n  ```\n    ╭───»───╮\n  o─┴┬─[P]─┬┴─o\n     ╰──«──╯\n  ```\n\n  The pattern `*P` tries to match as many occurrences of pattern `P` as\n  possible - this operator always behaves *greedily*.\n\n  For example, `*\"foo\" * \"bar\"` matches `\"bar\"`, `\"fooboar\"`, `\"foofoobar\"`,\n  etc.\n\n\n- Match one or more times: `+P`\n\n  ```\n  o─┬─[P]─┬─o\n    ╰──«──╯\n  ```\n\n  The pattern `+P` matches `P` at least once, but also more times.\n  It is equivalent to the `P * *P` - this operator always behave *greedily*.\n\n\n- Search: `@P`\n\n  This operator searches for pattern `P` using an optimized implementation. It\n  is equivalent to `s \u003c- *(1 - P) * P`, which can be read as \"try to match as\n  many characters as possible not matching `P`, and then match `P`:\n\n  ```\n    ╭─────»─────╮\n    │  ━━━      │\n  o─┴┬─[P]─»─1─┬┴»─[P]──o\n     ╰────«────╯\n  ```\n\n  Note that this operator does not allow capturing the skipped data up to the\n  match; if this is required you can manually construct a grammar to do this.\n\n\n- Match exactly `n` times: `P[n]`\n\n  The pattern `P[n]` matches `P` exactly `n` times.\n\n  For example, `\"foo\"[3]` only matches the string `\"foofoofoo\"`:\n\n  ```\n  o──[P]─»─[P]─»─[P]──o\n  ```\n\n\n- Match `m` to `n` times: `P[m..n]`\n\n  The pattern `P[m..n]` matches `P` at least `m` and at most `n` times.\n\n  For example, `\"foo[1,3]\"` matches `\"foo\"`, `\"foofoo\"` and `\"foofoofo\"`:\n\n  ```\n          ╭──»──╮ ╭──»──╮\n  o──[P]─»┴─[P]─┴»┴─[P]─┴─o\n  ```\n\n\n## Precedence operators\n\nNote: This is an experimental feature, the implementation or API might change\nin the future.\n\nPrecedence operators allows for the construction of \"precedence climbing\" or\n\"Pratt parsers\" with NPeg. The main use for this feature is building parsers\nfor programming languages that follow the usual precedence and associativity\nrules of arithmetic expressions.\n\n- Left associative precedence of `N`: `P ^ N`\n\n```\n   \u003c1\u003c   \no──[P]──o\n```\n\n- Right associative precedence of `N`: `P ^^ N`\n\n```\n   \u003e1\u003e \no──[P]──o\n```\n\nDuring parsing NPeg keeps track of the current precedence level of the parsed\nexpression - the default is `0` if no precedence has been assigned yet. When\nthe `^` operator is matched, either one of the next three cases applies:\n\n- `P ^ N` where `N \u003e 0` and `N` is lower then the current precedence: in this\n  case the current precedence is set to `N` and parsing of pattern `P`\n  continues.\n\n- `P ^ N` where `N \u003e 0` and `N` is higher or equal then the current precedence:\n  parsing will fail and backtrack.\n\n- `P ^ 0`: resets the current precedence to 0 and continues parsing. This main\n  use case for this is parsing sub-expressions in parentheses.\n\nThe heart of a Pratt parser in NPeg would look something like this:\n\n```nim\nexp \u003c- prefix * *infix\n\nparenExp \u003c- ( \"(\" * exp * \")\" ) ^ 0\n\nprefix \u003c- number | parenExp\n\ninfix \u003c- {'+','-'}    * exp ^  1 |\n         {'*','/'}    * exp ^  2 |\n         {'^'}        * exp ^^ 3:\n```\n\nMore extensive documentation will be added later, for now take a look at the\nexample in `tests/precedence.nim`.\n\n\n## Captures\n\n```\n     ╭╶╶╶╶╶╮\ns o────[P]────o\n     ╰╶╶╶╶╶╯\n```\n\nNPeg supports a number of ways to capture data when parsing a string.\nThe various capture methods are described here, including a concise example.\n\nThe capture examples below build on the following small PEG, which parses\na comma separated list of key-value pairs:\n\n```nim\nconst data = \"one=1,two=2,three=3,four=4\"\n\nlet parser = peg \"pairs\":\n  pairs \u003c- pair * *(',' * pair) * !1\n  word \u003c- +Alpha\n  number \u003c- +Digit\n  pair \u003c- word * '=' * number\n\nlet r = parser.match(data)\n```\n\n### String captures\n\nThe basic method for capturing is marking parts of the peg with the capture\nprefix `\u003e`. During parsing NPeg keeps track of all matches, properly discarding\nany matches which were invalidated by backtracking. Only when parsing has fully\nsucceeded it creates a `seq[string]` of all matched parts, which is then\nreturned in the `MatchData.captures` field.\n\nIn the example, the `\u003e` capture prefix is added to the `word` and `number`\nrules, causing the matched words and numbers to be appended to the result\ncapture `seq[string]`:\n\n```nim\nlet parser = peg \"pairs\":\n  pairs \u003c- pair * *(',' * pair) * !1\n  word \u003c- +Alpha\n  number \u003c- +Digit\n  pair \u003c- \u003eword * '=' * \u003enumber\n\nlet r = parser.match(data)\n```\n\nThe resulting list of captures is now:\n\n```nim\n@[\"one\", \"1\", \"two\", \"2\", \"three\", \"3\", \"four\", \"4\"]\n```\n\n\n### Code block captures\n\nCode block captures offer the most flexibility for accessing matched data in\nNPeg. This allows you to define a grammar with embedded Nim code for handling\nthe data during parsing.\n\nNote that for code block captures, the Nim code gets executed during parsing,\n*even if the match is part of a pattern that fails and is later backtracked*.\n\nWhen a grammar rule ends with a colon `:`, the next indented block in the\ngrammar is interpreted as Nim code, which gets executed when the rule has been\nmatched. Any string captures that were made inside the rule are available to\nthe Nim code in the injected variable `capture[]` of type `seq[Capture]`:\n\n```\ntype Capture = object\n  s*: string      # The captured string\n  si*: int        # The index of the captured string in the subject\n```\n\nThe total subject matched by the code block rule is available in `capture[0]`\nAny additional explicit `\u003e` string captures made by the rule or any of its\nchild rules will be available as `capture[1]`, `capture[2]`, ...\n\nFor convenience there is syntactic sugar available in the code block capture\nblocks:\n\n- The variables `$0` to `$9` are rewritten to `capture[n].s` and can be used to\n  access the captured strings. The `$` operator uses then usual Nim precedence,\n  thus these variables might need parentheses or different ordering in some\n  cases, for example `$1.parseInt` should be written as `parseInt($1)`.\n\n- The variables `@0` to `@9` are rewritten to `capture[n].si` and can be used\n  to access the offset in the subject of the matched captures.\n\nExample:\n```nim\nlet p = peg foo:\n  foo \u003c- \u003e(1 * \u003e1) * 1:\n    echo \"$0 = \", $0\n    echo \"$1 = \", $1\n    echo \"$2 = \", $2\n       \necho p.match(\"abc\").ok\n```\n\nWill output\n\n```nim\n$0 = abc\n$1 = ab\n$2 = b\n```\n\nCode block captures consume all embedded string captures, so these captures\nwill no longer be available after matching.\n\nA code block capture can also produce captures by calling the `push(s: string)`\nfunction from the code block. Note that this is an experimental feature and\nthat the API might change in future versions.\n\nThe example has been extended to capture each word and number with the `\u003e`\nstring capture prefix. When the `pair` rule is matched, the attached code block\nis executed, which adds the parsed key and value to the `words` table.\n\n```nim\nfrom strutils import parseInt\nvar words = initTable[string, int]()\n\nlet parser = peg \"pairs\":\n  pairs \u003c- pair * *(',' * pair) * !1\n  word \u003c- +Alpha\n  number \u003c- +Digit\n  pair \u003c- \u003eword * '=' * \u003enumber:\n    words[$1] = parseInt($2)\n\nlet r = parser.match(data)\n```\n\nAfter the parsing finished, the `words` table will now contain:\n\n```nim\n{\"two\": 2, \"three\": 3, \"one\": 1, \"four\": 4}\n```\n\n\n#### Custom match validations\n\nCode block captures can be used for additional validation of a captured string:\nthe code block can call the functions `fail()` or `validate(bool)` to indicate\nif the match should succeed or fail. Failing matches are handled as if the\ncapture itself failed and will result in the usual backtracking. When the\n`fail()` or `validate()` functions are not called, the match will succeed\nimplicitly.\n\nFor example, the following rule will check if a passed number is a valid\n`uint8` number:\n\n```nim\nuint8 \u003c- \u003eDigit[1..3]:\n  let v = parseInt($a)\n  validate v\u003e=0 and v\u003c=255\n```\n\nThe following grammar will cause the whole parse to fail when the `error` rule\nmatches:\n\n```nim\nerror \u003c- 0:\n  fail()\n```\n\nNote: The Nim code block is running within the NPeg parser context and in\ntheory could access to its internal state - this could be used to create custom\nvalidator/matcher functions that can inspect the subject string, do lookahead\nor lookback, and adjust the subject index to consume input. At the time of\nwriting, NPeg lacks a formal API or interface for this though, and I am not\nsure yet what this should look like - If you are interested in doing this,\ncontact me so we can discuss the details.\n\n#### Passing state\n\nNPeg allows passing of data of a specific type to the `match()` function, this\nvalue is then available inside code blocks as a variable. This mitigates the\nneed for global variables for storing or retrieving data in access captures.\n\nThe syntax for passing data in a grammar is:\n\n```\npeg(name, identifier: Type)\n```\n\nFor example, the above parser can be rewritten as such:\n\n```nim\ntype Dict = Table[string, int]\n\nlet parser = peg(\"pairs\", userdata: Dict):\n  pairs \u003c- pair * *(',' * pair) * !1\n  word \u003c- +Alpha\n  number \u003c- +Digit\n  pair \u003c- \u003eword * '=' * \u003enumber:\n    userdata[$1] = parseInt($2)\n\nvar words: Dict\nlet r = parser.match(data, words)\n```\n\n\n### Backreferences\n\nBackreferences allow NPeg to match an exact string that matched earlier in the\ngrammar. This can be useful to match repetitions of the same word, or for\nexample to match so called here-documents in programming languages.\n\nFor this, NPeg offers the `R` operator with the following two uses:\n\n* The `R(name, P)` pattern creates a named reference for pattern `P` which can\n  be referred to by name in other places in the grammar.\n\n* The pattern `R(name)` matches the contents of the named reference that\n  earlier been stored with `R(name, P)` pattern.\n\nFor example, the following rule will match only a string which will have the \nsame character in the first and last position:\n\n```\npatt R(\"c\", 1) * *(1 - R(\"c\")) * R(\"c\") * !1\n```\n\nThe first part of the rule `R(\"c\", 1)` will match any character, and store this\nin the named reference `c`. The second part will match a sequence of zero or\nmore characters that do not match reference `c`, followed by reference `c`.\n\n\n## More about grammars\n\n\n### Ordering of rules in a grammar\n\nRepetitive inlining of rules might cause a grammar to grow too large, resulting\nin a huge executable size and slow compilation. NPeg tries to mitigate this in\ntwo ways:\n\n* Patterns that are too large will not be inlined, even if the above ordering\n  rules apply.\n\n* NPeg checks the size of the total grammar, and if it thinks it is too large\n  it will fail compilation with the error message `NPeg: grammar too complex`.\n\nCheck the section \"Compile-time configuration\" below for more details about too\ncomplex grammars.\n\nThe parser size and performance depends on many factors; when performance\nand/or code size matters, it pays to experiment with different orderings and\nmeasure the results.\n\nWhen in doubt, check the generated parser instructions by compiling with the\n`-d:npegTrace` or `-d:npegDotDir` flags - see the section Tracing and\nDebugging for more information.\n\nAt this time the upper limit is 4096 rules, this might become a configurable\nnumber in a future release.\n\nFor example, the following grammar will not compile because recursive inlining\nwill cause it to expand to a parser with more then 4^6 = 4096 rules:\n\n```\nlet p = peg \"z\":\n  f \u003c- 1\n  e \u003c- f * f * f * f\n  d \u003c- e * e * e * e\n  c \u003c- d * d * d * d\n  b \u003c- c * c * c * c\n  a \u003c- b * b * b * b\n  z \u003c- a * a * a * a\n```\n\nThe fix is to change the order of the rules so that instead of inlining NPeg\nwill use a calling mechanism:\n\n```\nlet p = peg \"z\":\n  z \u003c- a * a * a * a\n  a \u003c- b * b * b * b\n  b \u003c- c * c * c * c\n  c \u003c- d * d * d * d\n  d \u003c- e * e * e * e\n  e \u003c- f * f * f * f\n  f \u003c- 1\n```\n\nWhen in doubt check the generated parser instructions by compiling with the\n`-d:npegTrace` flag - see the section Tracing and Debugging for more\ninformation.\n\n\n### Templates, or parameterized rules\n\nWhen building more complex grammars you may find yourself duplicating certain\nconstructs in patterns over and over again. To avoid code repetition (DRY),\nNPeg provides a simple mechanism to allow the creation of parameterized rules.\nIn good Nim-fashion these rules are called \"templates\". Templates are defined\njust like normal rules, but have a list of arguments, which are referred to in\nthe rule. Technically, templates just perform a basic search-and-replace\noperation: every occurrence of a named argument is replaced by the exact\npattern passed to the template when called.\n\nFor example, consider the following grammar:\n\n```nim\nnumberList \u003c- +Digit * *( ',' * +Digit)\nwordList \u003c- +Alpha * *( ',' * +Alpha)\n```\n\nThis snippet uses a common pattern twice for matching lists: `p * *( ',' * p)`.\nThis matches pattern `p`, followed by zero or more occurrences of a comma\nfollowed by pattern `p`. For example, `numberList` will match the string\n`1,22,3`.\n\nThe above example can be parameterized with a template like this:\n\n```nim\ncommaList(item) \u003c- item * *( ',' * item )\nnumberList \u003c- commaList(+Digit)\nwordList \u003c- commaList(+Alpha)\n```\n\nHere the template `commaList` is defined, and any occurrence of its argument\n'item' will be replaced with the patterns passed when calling the template.\nThis template is used to define the more complex patterns `numberList` and\n`wordList`.\n\nTemplates may invoke other templates recursively; for example the above can\neven be further generalized:\n\n```nim\nlist(item, sep) \u003c- item * *( sep * item )\ncommaList(item) \u003c- list(item, ',')\nnumberList \u003c- commaList(+Digit)\nwordList \u003c- commaList(+Alpha)\n```\n\n\n### Composing grammars with libraries\n\nFor simple grammars it is usually fine to build all patterns from scratch from\natoms and operators, but for more complex grammars it makes sense to define\nreusable patterns as basic building blocks.\n\nFor this, NPeg keeps track of a global library of patterns and templates. The\n`grammar` macro can be used to add rules or templates to this library. All\npatterns in the library will be stored with a *qualified* identifier in the\nform `libraryname.patternname`, by which they can be referred to at a later\ntime.\n\nFor example, the following fragment defines three rules in the library with the\nname `number`. The rules will be stored in the global library and are referred\nto in the peg by their qualified names `number.dec`, `number.hex` and\n`number.oct`:\n\n```nim\ngrammar \"number\":\n  dec \u003c- {'1'..'9'} * *{'0'..'9'}\n  hex \u003c- i\"0x\" * +{'0'..'9','a'..'f','A'..'F'}\n  oct \u003c- '0' * *{'0'..'9'}\n\nlet p = peg \"line\":\n  line \u003c- int * *(\",\" * int)\n  int \u003c- number.dec | number.hex | number.oct\n\nlet r = p.match(\"123,0x42,0644\")\n```\n\nNPeg offers a number of pre-defined libraries for your convenience, these can\nbe found in the `npeg/lib` directory. A library an be imported with the regular\nNim `import` statement, all rules defined in the imported file will then be\nadded to NPeg's global pattern library. For example:\n\n```nim\nimport npeg/lib/uri\n```\n\n\nNote that templates defined in libraries do not implicitly bind the the rules\nfrom that grammar; instead, you need to explicitly qualify the rules used in\nthe template to refer to the grammar. For example:\n\n```nim\ngrammar \"foo\":\n  open \u003c- \"(\"\n  close \u003c- \")\"\n  inBrackets(body): foo.open * body * foo.close\n```\n\n### Library rule overriding/shadowing\n\nTo allow the user to add custom captures to imported grammars or rules, it is\npossible to *override* or *shadow* an existing rule in a grammar.\n\nOverriding will replace the rule from the library with the provided new rule,\nallowing the caller to change parts of an imported grammar. A overridden rule\nis allowed to reference the original rule by name, which will cause the new\nrule to *shadow* the original rule. This will effectively rename the original\nrule and replace it with the newly defined rule which will call the original\nreferred rule.\n\nFor example, the following snippet will reuse the grammar from the `uri`\nlibrary and capture some parts of the URI in a Nim object:\n\n```nim\nimport npeg/lib/uri\n\ntype Uri = object\n  host: string\n  scheme: string\n  path: string\n  port: int\n\nvar myUri: Uri\n\nlet parser = peg \"line\":\n  line \u003c- uri.URI\n  uri.scheme \u003c- \u003euri.scheme: myUri.scheme = $1\n  uri.host \u003c- \u003euri.host:     myUri.host = $1\n  uri.port \u003c- \u003euri.port:     myUri.port = parseInt($1)\n  uri.path \u003c- \u003euri.path:     myUri.path = $1\n\necho parser.match(\"http://nim-lang.org:8080/one/two/three\")\necho myUri  # --\u003e (host: \"nim-lang.org\", scheme: \"http\", path: \"/one/two/three\", port: 8080)\n```\n\n## Error handling\n\nNPeg offers a number of ways to handle errors during parsing a subject string;\nwhat method best suits your parser depends on your requirements. \n\n\n### MatchResult\n\nThe most simple way to handle errors is to inspect the `MatchResult` object\nthat is returned by the `match()` proc:\n\n```nim\nMatchResult = object\n  ok: bool\n  matchLen: int\n  matchMax: int\n```\n\nThe `ok` field in the `MatchResult` indicates if the parser was successful:\nwhen the complete pattern has been matched this value will be set to `true`,\nif the complete pattern did not match the subject the value will be `false`.\n\nIn addition to the `ok` field, the `matchMax` field indicates the maximum\noffset into the subject the parser was able to match the string. If the\nmatching succeeded `matchMax` equals the total length of the subject, if the\nmatching failed, the value of `matchMax` is usually a good indication of where\nin the subject string the error occurred:\n\n```\nlet a = patt 4\nlet r = a.match(\"123\")\nif not r.ok:\n  echo \"Parsing failed at position \", r.matchMax\n```\n\n### NpegParseError exceptions\n\nWhen, during matching, the parser reaches an `E\"message\"` atom in the grammar,\nNPeg will raise an `NPegParseError` exception with the given message.\nThe typical use case for this atom is to be combine with the ordered choice `|`\noperator to generate helpful error messages.\nThe following example illustrates this:\n\n```nim\nlet parser = peg \"list\":\n  list \u003c- word * *(comma * word) * !1\n  word \u003c- +Alpha | E\"expected word\"\n  comma \u003c- ',' | E\"expected comma\"\n\ntry:\n  echo parser.match(\"one,two;three\")\nexcept NPegParseError as e:\n  echo \"Parsing failed at position \", e.matchMax, \": \", e.msg\n```\n\nThe rule `comma` tries to match the literal `','`. If this can not be matched,\nthe rule `E\"expected comma\"` will match instead, where `E` will raise an\n`NPegParseError` exception.\n\nThe `NPegParseError` type contains the same two fields as `MatchResult` to\nindicate where in the subject string the match failed: `matchLen` and\n`matchMax`, which can be used as an indication of the location of the parse\nerror:\n\n```\nParsing failed at position 7: expected comma\n```\n\n\n### Other exceptions\n\nNPeg can raise a number of other exception types during parsing:\n\n- `NPegParseError`: described in the previous section\n\n- `NPegStackOverflowError`: a stack overflow occured in the backtrace\n  or call stack; this is usually an indication of a faulty or too complex\n  grammar.\n\n- `NPegUnknownBackrefError`: An unknown back reference identifier is used in an \n  `R()` rule.\n\n- `NPegCaptureOutOfRangeError`: A code block capture tries to access a capture\n  that is not available using the `$` notation or by accessing the `capture[]`\n  seq.\n\n\nAll the above errors are inherited from the generic `NPegException` object.\n\n\n### Parser stack trace\n\nIf an exception is raised from within an NPeg parser - either by the `E` atom\nor by nim code in a code block capture - NPeg will augment the Nim stack trace\nwith frames indicating where in the grammar the exception occured.\n\nThe above example will generate the following stack trace, note the last two\nentries which are added by NPeg and show the rules in which the exception\noccured:\n\n```\n/tmp/list.nim(9)         list\n./npeg/src/npeg.nim(142) match\n./npeg/src/npeg.nim(135) match\n/tmp/flop.nim(4)         list \u003c- word * *(comma * word) * eof\n/tmp/flop.nim(7)         word \u003c- +{'a' .. 'z'} | E\"expected word\"\nError: unhandled exception: Parsing error at #14: \"expected word\" [NPegParseError]\n```\n\nNote: this requires Nim 'devel' or version \u003e 1.6.x; on older versions you can\nuse `-d:npegStackTrace` to make NPeg dump the stack to stdout.\n\n\n## Advanced topics\n\n### Parsing other types then strings\n\nNote: This is an experimental feature, the implementation or API might change\nin the future.\n\nNPeg was originally designed to parse strings like a regular PEG engine, but\nhas since evolved into a generic parser that can parse any subject of type\n`openArray[T]`. This section describes how to use this feature.\n\n- The `peg()` macro must be passed an additional argument specifying the base\n  type `T` of the subject; the generated parser will then parse a subject of\n  type `openArray[T]`. When not given, the default type is `char`, and the parser\n  parsers `openArray[char]`, or more typically, `string`.\n\n- When matching non-strings, some of the usual atoms like strings or character\n  sets do not make sense in a grammar, instead the grammar uses literal atoms.\n  Literals can be specified in square brackets and are interpreted as any Nim\n  code: `[foo]`, `[1+1]` or `[\"foo\"]` are all valid literals.\n\n- When matching non-strings, captures will be limited to only a single element\n  of the base type, as this makes more sense when parsing a token stream.\n\nFor an example of this feature check the example in `tests/lexparse.nim` - this\nimplements a classic parser with separate lexing and parsing stages.\n\n\n## Some notes on using PEGs\n\n\n### Anchoring and searching\n\nUnlike regular expressions, PEGs are always matched in *anchored* mode only:\nthe defined pattern is matched from the start of the subject string.\nFor example, the pattern `\"bar\"` does not match the string `\"foobar\"`.\n\nTo search for a pattern in a stream, a construct like this can be used:\n\n```nim\np \u003c- \"bar\"\nsearch \u003c- p | 1 * search\n```\n\nThe above grammar first tries to match pattern `p`, or if that fails, matches\nany character `1` and recurs back to itself. Because searching is a common\noperation, NPeg provides the builtin `@P` operator for this.\n\n\n### Complexity and performance\n\nAlthough it is possible to write patterns with exponential time complexity for\nNPeg, they are much less common than in regular expressions, thanks to the\nlimited backtracking. In particular, patterns written without grammatical rules\nalways have a worst-case time `O(n^k)` (and space `O(k)`, which is constant for\na given pattern), where `k` is the pattern's star height. Moreover, NPeg has a\nsimple and clear performance model that allows programmers to understand and\npredict the time complexity of their patterns. The model also provides a firm\nbasis for pattern optimizations.\n\n(Adapted from Ierusalimschy, \"A Text Pattern-Matching Tool based on Parsing\nExpression Grammars\", 2008)\n\n\n### End of string\n\nPEGs do not care what is in the subject string after the matching succeeds. For\nexample, the rule `\"foo\"` happily matches the string `\"foobar\"`. To make sure\nthe pattern matches the end of string, this has to be made explicit in the\npattern.\n\nThe idiomatic notation for this is `!1`, meaning \"only succeed if there is not\na single character left to match\" - which is only true for the end of the\nstring.\n\n\n### Non-consuming atoms and captures\n\nThe lookahead(`\u0026`) and not(`!`) operators may not consume any input, and make\nsure that after matching the internal parsing state of the parser is reset to\nas is was before the operator was started, including the state of the captures.\nThis means that any captures made inside a `\u0026` and `!` block also are\ndiscarded. It is possible however to capture the contents of a non-consuming\nblock with a code block capture, as these are _always_ executed, even when the\nparser state is rolled back afterwards.\n\n\n### Left recursion\n\nNPeg does not support left recursion (this applies to PEGs in general). For\nexample, the rule\n\n```nim\nA \u003c- A | 'a'\n```\n\nwill cause an infinite loop because it allows for left-recursion of the\nnon-terminal `A`.\n\nSimilarly, the grammar\n\n```nim\nA \u003c- B | 'a' A\nB \u003c- A\n```\n\nis problematic because it is mutually left-recursive through the non-terminal\n`B`.\n\nNote that loops of patterns that can match the empty string will not result in\nthe expected behavior. For example, the rule `*0` will cause the parser to\nstall and go into an infinite loop.\n\n\n### UTF-8 / Unicode\n\nNPeg has no built-in support for Unicode or UTF-8, instead is simply able to\nparse UTF-8 documents just as like any other string. NPeg comes with a simple\nUTF-8 grammar library which should simplify common operations like matching a\nsingle code point or character class. The following grammar splits an UTF-8\ndocument into separate characters/glyphs by using the `utf8.any` rule:\n\n```nim\nimport npeg/lib/utf8\n\nlet p = peg \"line\":\n  line \u003c- +char\n  char \u003c- \u003eutf8.any\n\nlet r = p.match(\"γνωρίζω\")\necho r.captures()   # --\u003e @[\"γ\", \"ν\", \"ω\", \"ρ\", \"ί\", \"ζ\", \"ω\"]\n```\n\n\n## Tracing and debugging\n\n### Syntax diagrams\n\nWhen compiled with `-d:npegGraph`, NPeg will dump \n[syntax diagrams](https://en.wikipedia.org/wiki/Syntax_diagram)\n(also known as railroad diagrams) for all parsed rules.\n\nSyntax diagrams are sometimes helpful to understand or debug a grammar, or to\nget more insight in a grammars' complexity.\n\n```\n                              ╭─────────»──────────╮                     \n                              │      ╭─────»──────╮│                     \n                ╭╶╶╶╶╶╶╶╶╶╶╮  │      │  ━━━━      ││         ╭╶╶╶╶╶╶╶╮   \ninf o──\"INF:\"─»───[number]───»┴─\",\"─»┴┬─[lf]─»─1─┬┴┴»─[lf]─»───[url]────o\n                ╰╶╶╶╶╶╶╶╶╶╶╯          ╰────«─────╯           ╰╶╶╶╶╶╶╶╯   \n```\n\n* Optionals (`?`) are indicated by a forward arrow overhead.\n* Repeats ('+') are indicated by a backwards arrow underneath.\n* Literals (strings, chars, sets) are printed in purple.\n* Non-terminals are printed in cyan between square brackets.\n* Not-predicates (`!`) are overlined in red. Note that the diagram does not\n  make it clear that the input for not-predicates is not consumed.\n* Captures are boxed in a gray rectangle, optionally including the capture\n  name.\n\n[Here](/doc/example-railroad.png) is a a larger example of an URL parser.\n\n### Grammar graphs\n\nNPeg can generate a graphical representation of a grammar to show the relations\nbetween rules. The generated output is a `.dot` file which can be processed by\nthe Graphviz tool to generate an actual image file.\n\nWhen compiled with `-d:npegDotDir=\u003cPATH\u003e`, NPeg will generate a `.dot` file for\neach grammar in the code and write it to the given directory.\n\n![graph](/doc/example-graph.png)\n\n* Edge colors represent the rule relation:\n  grey=inline, blue=call, green=builtin\n\n* Rule colors represent the relative size/complexity of a rule:\n  black=\u003c10, orange=10..100, red=\u003e100\n\nLarge rules result in larger generated code and slow compile times. Rule size\ncan generally be decreased by changing the rule order in a grammar to allow\nNPeg to call rules instead of inlining them.\n\n\n### Tracing\n\nWhen compiled with `-d:npegTrace`, NPeg will dump its intermediate\nrepresentation of the compiled PEG, and will dump a trace of the execution\nduring matching. These traces can be used for debugging or optimization of a\ngrammar.\n\nFor example, the following program:\n\n```nim\nlet parser = peg \"line\":\n  space \u003c- ' '\n  line \u003c- word * *(space * word)\n  word \u003c- +{'a'..'z'}\n\ndiscard parser.match(\"one two\")\n```\n\nwill output the following intermediate representation at compile time. From\nthe IR it can be seen that the `space` rule has been inlined in the `line`\nrule, but that the `word` rule has been emitted as a subroutine which gets\ncalled from `line`:\n\n```\nline:\n   0: line           opCall 6 word        word\n   1: line           opChoice 5           *(space * word)\n   2:  space         opStr \" \"            ' '\n   3: line           opCall 6 word        word\n   4: line           opPartCommit 2       *(space * word)\n   5:                opReturn\n\nword:\n   6: word           opSet '{'a'..'z'}'   {'a' .. 'z'}\n   7: word           opSpan '{'a'..'z'}'  +{'a' .. 'z'}\n   8:                opReturn\n```\n\nAt runtime, the following trace is generated. The trace consists of a number\nof columns:\n\n1. The current instruction pointer, which maps to the compile time dump.\n2. The index into the subject.\n3. The substring of the subject.\n4. The name of the rule from which this instruction originated.\n5. The instruction being executed.\n6. The backtrace stack depth.\n\n```\n  0|  0|one two                 |line           |call -\u003e word:6                          |\n  6|  0|one two                 |word           |set {'a'..'z'}                          |\n  7|  1|ne two                  |word           |span {'a'..'z'}                         |\n  8|  3| two                    |               |return                                  |\n  1|  3| two                    |line           |choice -\u003e 5                             |\n  2|  3| two                    | space         |chr \" \"                                 |*\n  3|  4|two                     |line           |call -\u003e word:6                          |*\n  6|  4|two                     |word           |set {'a'..'z'}                          |*\n  7|  5|wo                      |word           |span {'a'..'z'}                         |*\n  8|  7|                        |               |return                                  |*\n  4|  7|                        |line           |pcommit -\u003e 2                            |*\n  2|  7|                        | space         |chr \" \"                                 |*\n   |  7|                        |               |fail                                    |*\n  5|  7|                        |               |return (done)                           |\n```\n\nThe exact meaning of the IR instructions is not discussed here.\n\n\n## Compile-time configuration\n\nNPeg has a number of configurable setting which can be configured at compile\ntime by passing flags to the compiler. The default values should be ok in most\ncases, but if you ever run into one of those limits you are free to configure\nthose to your liking:\n\n* `-d:npegPattMaxLen=N` This is the maximum allowed length of NPeg's internal\n  representation of a parser, before it gets translated to Nim code. The reason\n  to check for an upper limit is that some grammars can grow exponentially by\n  inlining of patterns, resulting in slow compile times and oversized\n  executable size. (default: 4096)\n\n* `-d:npegInlineMaxLen=N` This is the maximum allowed length of a pattern to be\n  inlined. Inlining generally results in a faster parser, but also increases\n  code size. It is valid to set this value to 0; in that case NPeg will never\n  inline patterns and use a calling mechanism instead, this will result in the\n  smallest code size. (default: 50)\n\n* `-d:npegRetStackSize=N` Maximum allowed depth of the return stack for the\n  parser. The default value should be high enough for practical purposes, the\n  stack depth is only limited to detect invalid grammars. (default: 1024)\n\n* `-d:npegBackStackSize=N` Maximum allowed depth of the backtrace stack for the\n  parser. The default value should be high enough for practical purposes, the\n  stack depth is only limited to detect invalid grammars. (default: 1024)\n\n* `-d:npegGcsafe` This is a workaround for the case where NPeg needs to be used\n  from a `{.gcsafe.}` context when using threads. This will mark the generated\n  matching function to be `{.gcsafe.}`.\n\n\n## Tracing and debugging\n\nNPeg has a number of compile time flags to enable tracing and debugging of the\ngenerated parser:\n\n* `-d:npegTrace`: Enable compile time and run time tracing. Please refer to the \n  section 'Tracing' for more details.\n\n* `-d:npegGraph`: Dump syntax diagrams of all parsed rules at compile time.\n\nThese flags are meant for debugging NPeg itself, and are typically not useful\nto the end user:\n\n* `-d:npegDebug`: Enable more debug info. Meant for NPeg development debugging\n  purposes only.\n\n* `-d:npegExpand`: Dump the generated Nim code for all parsers defined in the\n  program. Meant for NPeg development debugging purposes only.\n\n* `-d:npegStacktrace`: When enabled, NPeg will dump a stack trace of the\n  current position in the parser when an exception is thrown by NPeg itself or\n  by Nim code in code block captures.\n\n\n## Random stuff and frequently asked questions\n\n\n### Why does NPeg not support regular PEG syntax?\n\nThe NPeg syntax is similar, but not exactly the same as the official PEG\nsyntax: it uses some different operators, and prefix instead of postfix\noperators. The reason for this is that the NPeg grammar is parsed by a Nim\nmacro in order to allow code block captures to embed Nim code, which puts some\nlimitations on the available syntax. Also, NPeg's operators are chosen so that\nthey have the right precedence for PEGs.\n\nThe result is that the grammer itself is expressed as valid Nim, which has the\nnice side effect of allowing syntax highlighting and code completion work with\nyour favorite editor.\n\n\n### Can NPeg be used to parse EBNF grammars?\n\nAlmost, but not quite. Although PEGS and EBNF look quite similar, there are\nsome subtle but important differences which do not allow a literal translation\nfrom EBNF to PEG. Notable differences are left recursion and ordered choice.\nAlso, see \"From EBNF to PEG\" from Roman R. Redziejowski.\n\n\n### NPeg and generic functions\n\nNim's macro system is sometimes finicky and not well defined, and NPeg seems to\npush it to the limit. This means that you might run into strange and\nunexpected issues, especially when mixing NPeg with generic code.\n\nIf you run into weird error messages that do not seem to make sense when using\nNPeg from generic procs, check the links below for more information and\npossible workarounds:\n\n- https://github.com/nim-lang/Nim/issues/22740\n- https://github.com/zevv/npeg/issues/68\n\n\n## Examples\n\n### Parsing arithmetic expressions\n\n```nim\nlet parser = peg \"line\":\n  exp      \u003c- term   * *( ('+'|'-') * term)\n  term     \u003c- factor * *( ('*'|'/') * factor)\n  factor   \u003c- +{'0'..'9'} | ('(' * exp * ')')\n  line     \u003c- exp * !1\n\ndoAssert parser.match(\"3*(4+15)+2\").ok\n```\n\n\n### A complete JSON parser\n\nThe following PEG defines a complete parser for the JSON language - it will not\nproduce any captures, but simple traverse and validate the document:\n\n```nim\nlet s = peg \"doc\":\n  S              \u003c- *Space\n  jtrue          \u003c- \"true\"\n  jfalse         \u003c- \"false\"\n  jnull          \u003c- \"null\"\n\n  unicodeEscape  \u003c- 'u' * Xdigit[4]\n  escape         \u003c- '\\\\' * ({ '{', '\"', '|', '\\\\', 'b', 'f', 'n', 'r', 't' } | unicodeEscape)\n  stringBody     \u003c- ?escape * *( +( {'\\x20'..'\\xff'} - {'\"'} - {'\\\\'}) * *escape)\n  jstring         \u003c- ?S * '\"' * stringBody * '\"' * ?S\n\n  minus          \u003c- '-'\n  intPart        \u003c- '0' | (Digit-'0') * *Digit\n  fractPart      \u003c- \".\" * +Digit\n  expPart        \u003c- ( 'e' | 'E' ) * ?( '+' | '-' ) * +Digit\n  jnumber         \u003c- ?minus * intPart * ?fractPart * ?expPart\n\n  doc            \u003c- JSON * !1\n  JSON           \u003c- ?S * ( jnumber | jobject | jarray | jstring | jtrue | jfalse | jnull ) * ?S\n  jobject        \u003c- '{' * ( jstring * \":\" * JSON * *( \",\" * jstring * \":\" * JSON ) | ?S ) * \"}\"\n  jarray         \u003c- \"[\" * ( JSON * *( \",\" * JSON ) | ?S ) * \"]\"\n\ndoAssert s.match(json).ok\n\nlet doc = \"\"\" {\"jsonrpc\": \"2.0\", \"method\": \"subtract\", \"params\": [42, 23], \"id\": 1} \"\"\"\ndoAssert parser.match(doc).ok\n```\n\n\n### Captures\n\nThe following example shows how to use code block captures. The defined\ngrammar will parse a HTTP response document and extract structured data from\nthe document into a Nim object:\n\n```nim\nimport npeg, strutils, tables\n\ntype\n  Request = object\n    proto: string\n    version: string\n    code: int\n    message: string\n    headers: Table[string, string]\n\n# HTTP grammar (simplified)\n\nlet parser = peg(\"http\", userdata: Request):\n  space       \u003c- ' '\n  crlf        \u003c- '\\n' * ?'\\r'\n  url         \u003c- +(Alpha | Digit | '/' | '_' | '.')\n  eof         \u003c- !1\n  header_name \u003c- +(Alpha | '-')\n  header_val  \u003c- +(1-{'\\n'}-{'\\r'})\n  proto       \u003c- \u003e+Alpha:\n    userdata.proto = $1\n  version     \u003c- \u003e(+Digit * '.' * +Digit):\n    userdata.version = $1\n  code        \u003c- \u003e+Digit:\n    userdata.code = parseInt($1)\n  msg         \u003c- \u003e(+(1 - '\\r' - '\\n')):\n    userdata.message = $1\n  header      \u003c- \u003eheader_name * \": \" * \u003eheader_val:\n    userdata.headers[$1] = $2\n  response    \u003c- proto * '/' * version * space * code * space * msg\n  headers     \u003c- *(header * crlf)\n  http        \u003c- response * crlf * headers * eof\n\n\n# Parse the data and print the resulting table\n\nconst data = \"\"\"\nHTTP/1.1 301 Moved Permanently\nContent-Length: 162\nContent-Type: text/html\nLocation: https://nim.org/\n\"\"\"\n\nvar request: Request\nlet res = parser.match(data, request)\necho request\n```\n\nThe resulting data:\n\n```nim\n(\n  proto: \"HTTP\",\n  version: \"1.1\",\n  code: 301,\n  message: \"Moved Permanently\",\n  headers: {\n    \"Content-Length\": \"162\",\n    \"Content-Type\":\n    \"text/html\",\n    \"Location\": \"https://nim.org/\"\n  }\n)\n```\n\n\n### More examples\n\nMore examples can be found in tests/examples.nim.\n\n\n## Future directions / Todos / Roadmap / The long run\n\nHere are some things I'd like to have implemented one day. Some are hard and\nrequire me to better understand what I'm doing first. In no particular order:\n\n- Handling left recursion: PEGs are typically not good at handling grammar\n  invoking left recursion, see \n  https://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion\n  for an explanation of the problem. However, some smart people have found a way\n  to make this work anyway, but I am not yet able to understand this well enough\n  to implement this in NPeg.\n  https://github.com/zevv/npeg/blob/master/doc/papers/Left_recursion_in_parsing_expression_grammars.pdf\n\n- Design and implement a proper API for code block captures. The current API\n  feels fragile and fragmented (`capture[], $1/$2, fail(), validate()`), and\n  does not offer solid primitives to make custom match functions yet, something\n  better should be in place before NPeg goes v1.0.\n\n- Resuming/streaming: The current parser is almost ready to be invoked multiple\n  times, resuming parsing where it left off - this should allow parsing of\n  (infinite) streams. The only problem not solved yet is how to handle\n  captures: when a block of data is parsed it might contain data which must\n  later be available to collect the capture. Not sure how to handle this yet.\n\n- Memoization: I guess it would be possible to add (limited) memoization to \n  improve performance, but no clue where to start yet.\n\n- Parallelization: I wonder if parsing can parallelized: when reaching an\n  ordered choice, multiple threads should be able to try to parse each\n  individual choice. I do see problems with captures here, though.\n\n- I'm not happy about the `{.gcsafe.}` workaround. I'd be happy to hear any\n  ideas on how to improve this.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzevv%2Fnpeg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzevv%2Fnpeg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzevv%2Fnpeg/lists"}