{"id":13424672,"url":"https://github.com/yhirose/cpp-peglib","last_synced_at":"2026-03-14T05:19:46.716Z","repository":{"id":27006281,"uuid":"30470360","full_name":"yhirose/cpp-peglib","owner":"yhirose","description":"A single file C++ header-only PEG (Parsing Expression Grammars) library","archived":false,"fork":false,"pushed_at":"2025-05-06T10:19:50.000Z","size":3721,"stargazers_count":949,"open_issues_count":8,"forks_count":115,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-05-06T10:52:39.055Z","etag":null,"topics":["c-plus-plus","cpp","cpp17","header-only","parser-generator","parsing","parsing-expression-grammars","peg"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yhirose.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-02-07T21:10:11.000Z","updated_at":"2025-05-06T10:19:54.000Z","dependencies_parsed_at":"2023-11-20T22:25:24.567Z","dependency_job_id":"58de7250-d00c-4167-a0c4-b4652acbc8cb","html_url":"https://github.com/yhirose/cpp-peglib","commit_stats":null,"previous_names":[],"tags_count":49,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-peglib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-peglib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-peglib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-peglib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yhirose","download_url":"https://codeload.github.com/yhirose/cpp-peglib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253908954,"owners_count":21982685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","cpp","cpp17","header-only","parser-generator","parsing","parsing-expression-grammars","peg"],"created_at":"2024-07-31T00:00:57.598Z","updated_at":"2026-03-14T05:19:46.704Z","avatar_url":"https://github.com/yhirose.png","language":"C++","funding_links":[],"categories":["C++","Text Handling","Parsing Expression Grammars"],"sub_categories":[],"readme":"cpp-peglib\n==========\n\n[![](https://github.com/yhirose/cpp-peglib/workflows/CMake/badge.svg)](https://github.com/yhirose/cpp-peglib/actions)\n\nC++17 header-only [PEG](http://en.wikipedia.org/wiki/Parsing_expression_grammar) (Parsing Expression Grammars) library. You can start using it right away just by including `peglib.h` in your project.\n\nSince this library only supports C++17 compilers, please make sure that the compiler option `-std=c++17` is enabled.\n(`/std:c++17 /Zc:__cplusplus` for MSVC)\n\nYou can also try the online version, PEG Playground at https://yhirose.github.io/cpp-peglib.\n\nThe PEG syntax is well described on page 2 in the [document](http://www.brynosaurus.com/pub/lang/peg.pdf) by Bryan Ford. *cpp-peglib* also supports the following additional syntax for now:\n\n* `'...'i` (Case-insensitive literal operator)\n* `[...]i` (Case-insensitive character class operator)\n* `[^...]` (Negated character class operator)\n* `[^...]i` (Case-insensitive negated character class operator)\n* `{2,5}` (Regex-like repetition operator)\n* `\u003c` ... `\u003e` (Token boundary operator)\n* `~` (Ignore operator)\n* `\\x20` (Hex number char)\n* `\\u10FFFF` (Unicode char)\n* `%whitespace` (Automatic whitespace skipping)\n* `%word` (Word expression)\n* `$name(` ... `)` (Capture scope operator)\n* `$name\u003c` ... `\u003e` (Named capture operator)\n* `$name` (Backreference operator)\n* `|` (Dictionary operator)\n* `↑` (Cut operator)\n* `MACRO_NAME(` ... `)` (Parameterized rule or Macro)\n* `{ precedence L - + L / * }` (Parsing infix expression)\n* Left recursive rules (direct, indirect, and mutual left recursion)\n* `%recovery(` ... `)` (Error recovery operator)\n* `exp⇑label` or `exp^label` (Syntax sugar for `(exp / %recover(label))`)\n* `label { error_message \"...\" }` (Error message instruction)\n* `{ no_ast_opt }` (No AST node optimization instruction)\n\n'End of Input' check will be done as default. To disable the check, please call `disable_eoi_check`.\n\nThis library supports the linear-time parsing known as the [*Packrat*](http://pdos.csail.mit.edu/~baford/packrat/thesis/thesis.pdf) parsing. It also supports *left recursive* grammars (direct, indirect, and mutual) via a seed-growing algorithm, allowing natural expression of left-associative operators.\n\nIMPORTANT NOTE for some Linux distributions such as Ubuntu and CentOS: Need `-pthread` option when linking. See [#23](https://github.com/yhirose/cpp-peglib/issues/23#issuecomment-261126127), [#46](https://github.com/yhirose/cpp-peglib/issues/46#issuecomment-417870473) and [#62](https://github.com/yhirose/cpp-peglib/issues/62#issuecomment-492032680).\n\nI am sure that you will enjoy this excellent [\"Practical parsing with PEG and cpp-peglib\"](https://berthub.eu/articles/posts/practical-peg-parsing/) article by [bert hubert](https://berthub.eu/)!\n\nHow to use\n----------\n\nThis is a simple calculator sample. It shows how to define grammar, associate semantic actions to the grammar, and handle semantic values.\n\n```cpp\n// (1) Include the header file\n#include \u003cpeglib.h\u003e\n#include \u003cassert.h\u003e\n#include \u003ciostream\u003e\n\nusing namespace peg;\nusing namespace std;\n\nint main(void) {\n  // (2) Make a parser\n  parser parser(R\"(\n    # Grammar for Calculator...\n    Additive    \u003c- Multiplicative '+' Additive / Multiplicative\n    Multiplicative   \u003c- Primary '*' Multiplicative / Primary\n    Primary     \u003c- '(' Additive ')' / Number\n    Number      \u003c- \u003c [0-9]+ \u003e\n    %whitespace \u003c- [ \\t]*\n  )\");\n\n  assert(static_cast\u003cbool\u003e(parser) == true);\n\n  // (3) Setup actions\n  parser[\"Additive\"] = [](const SemanticValues \u0026vs) {\n    switch (vs.choice()) {\n    case 0: // \"Multiplicative '+' Additive\"\n      return any_cast\u003cint\u003e(vs[0]) + any_cast\u003cint\u003e(vs[1]);\n    default: // \"Multiplicative\"\n      return any_cast\u003cint\u003e(vs[0]);\n    }\n  };\n\n  parser[\"Multiplicative\"] = [](const SemanticValues \u0026vs) {\n    switch (vs.choice()) {\n    case 0: // \"Primary '*' Multiplicative\"\n      return any_cast\u003cint\u003e(vs[0]) * any_cast\u003cint\u003e(vs[1]);\n    default: // \"Primary\"\n      return any_cast\u003cint\u003e(vs[0]);\n    }\n  };\n\n  parser[\"Number\"] = [](const SemanticValues \u0026vs) {\n    return vs.token_to_number\u003cint\u003e();\n  };\n\n  // (4) Parse\n  parser.enable_packrat_parsing(); // Enable packrat parsing.\n\n  int val;\n  parser.parse(\" (1 + 2) * 3 \", val);\n\n  assert(val == 9);\n}\n```\n\nTo show syntax errors in grammar text:\n\n```cpp\nauto grammar = R\"(\n  # Grammar for Calculator...\n  Additive    \u003c- Multiplicative '+' Additive / Multiplicative\n  Multiplicative   \u003c- Primary '*' Multiplicative / Primary\n  Primary     \u003c- '(' Additive ')' / Number\n  Number      \u003c- \u003c [0-9]+ \u003e\n  %whitespace \u003c- [ \\t]*\n)\";\n\nparser parser;\n\nparser.set_logger([](size_t line, size_t col, const string\u0026 msg, const string \u0026rule) {\n  cerr \u003c\u003c line \u003c\u003c \":\" \u003c\u003c col \u003c\u003c \": \" \u003c\u003c msg \u003c\u003c \"\\n\";\n});\n\nauto ok = parser.load_grammar(grammar);\nassert(ok);\n```\n\nThere are four semantic actions available:\n\n```cpp\n[](const SemanticValues\u0026 vs, any\u0026 dt)\n[](const SemanticValues\u0026 vs)\n[](SemanticValues\u0026 vs, any\u0026 dt)\n[](SemanticValues\u0026 vs)\n```\n\n`SemanticValues` value contains the following information:\n\n* Semantic values\n* Matched string information\n* Token information if the rule is literal or uses a token boundary operator\n* Choice number when the rule is 'prioritized choice'\n\n`any\u0026 dt` is a 'read-write' context data which can be used for whatever purposes. The initial context data is set in `peg::parser::parse` method.\n\nA semantic action can return a value of arbitrary data type, which will be wrapped by `peg::any`. If a user returns nothing in a semantic action, the first semantic value in the `const SemanticValues\u0026 vs` argument will be returned. (Yacc parser has the same behavior.)\n\nHere shows the `SemanticValues` structure:\n\n```cpp\nstruct SemanticValues : protected std::vector\u003cany\u003e\n{\n  // Input text\n  const char* path;\n  const char* ss;\n\n  // Matched string\n  std::string_view sv() const { return sv_; }\n\n  // Line number and column at which the matched string is\n  std::pair\u003csize_t, size_t\u003e line_info() const;\n\n  // Tokens\n  std::vector\u003cstd::string_view\u003e tokens;\n  std::string_view token(size_t id = 0) const;\n\n  // Token conversion\n  std::string token_to_string(size_t id = 0) const;\n  template \u003ctypename T\u003e T token_to_number() const;\n\n  // Choice number (0 based index)\n  size_t choice() const;\n\n  // Transform the semantic value vector to another vector\n  template \u003ctypename T\u003e vector\u003cT\u003e transform(size_t beg = 0, size_t end = -1) const;\n}\n```\n\nThe following example uses `\u003c` ... `\u003e` operator, which is *token boundary* operator.\n\n```cpp\npeg::parser parser(R\"(\n  ROOT  \u003c- _ TOKEN (',' _ TOKEN)*\n  TOKEN \u003c- \u003c [a-z0-9]+ \u003e _\n  _     \u003c- [ \\t\\r\\n]*\n)\");\n\nparser[\"TOKEN\"] = [](const SemanticValues\u0026 vs) {\n  // 'token' doesn't include trailing whitespaces\n  auto token = vs.token();\n};\n\nauto ret = parser.parse(\" token1, token2 \");\n```\n\nWe can ignore unnecessary semantic values from the list by using `~` operator.\n\n```cpp\npeg::parser parser(R\"(\n  ROOT  \u003c-  _ ITEM (',' _ ITEM _)*\n  ITEM  \u003c-  ([a-z0-9])+\n  ~_    \u003c-  [ \\t]*\n)\");\n\nparser[\"ROOT\"] = [\u0026](const SemanticValues\u0026 vs) {\n  assert(vs.size() == 2); // should be 2 instead of 5.\n};\n\nauto ret = parser.parse(\" item1, item2 \");\n```\n\nThe following grammar is the same as the above.\n\n```cpp\npeg::parser parser(R\"(\n  ROOT  \u003c-  ~_ ITEM (',' ~_ ITEM ~_)*\n  ITEM  \u003c-  ([a-z0-9])+\n  _     \u003c-  [ \\t]*\n)\");\n```\n\n*Semantic predicate* support is available with a *predicate* action.\n\n```cpp\npeg::parser parser(\"NUMBER  \u003c-  [0-9]+\");\n\nparser[\"NUMBER\"] = [](const SemanticValues \u0026vs) {\n  return vs.token_to_number\u003clong\u003e();\n};\n\nparser[\"NUMBER\"].predicate = [](const SemanticValues \u0026vs,\n                                const std::any \u0026 /*dt*/, std::string \u0026msg) {\n  if (vs.token_to_number\u003clong\u003e() != 100) {\n    msg = \"value error!!\";\n    return false;\n  }\n  return true;\n};\n\nlong val;\nauto ret = parser.parse(\"100\", val);\nassert(ret == true);\nassert(val == 100);\n\nret = parser.parse(\"200\", val);\nassert(ret == false);\n```\n\nThe predicate can pass data to the action via `predicate_data` to avoid redundant computation:\n\n```cpp\npeg::parser parser(\"NUMBER  \u003c-  \u003c [0-9]+ \u003e\");\n\nparser[\"NUMBER\"].predicate = [](const SemanticValues \u0026vs,\n                                const std::any \u0026 /*dt*/, std::string \u0026msg,\n                                std::any \u0026predicate_data) {\n  int value;\n  auto [ptr, err] = std::from_chars(\n      vs.token().data(), vs.token().data() + vs.token().size(), value);\n  if (err != std::errc()) {\n    msg = \"Number out of range.\";\n    return false;\n  }\n  predicate_data = value;\n  return true;\n};\n\nparser[\"NUMBER\"] = [](const SemanticValues \u0026 /*vs*/, std::any \u0026 /*dt*/,\n                      const std::any \u0026predicate_data) {\n  return std::any_cast\u003cint\u003e(predicate_data);\n};\n```\n\n*enter* and *leave* actions are also available.\n\n```cpp\nparser[\"RULE\"].enter = [](const Context \u0026c, const char* s, size_t n, any\u0026 dt) {\n  std::cout \u003c\u003c \"enter\" \u003c\u003c std::endl;\n};\n\nparser[\"RULE\"] = [](const SemanticValues\u0026 vs, any\u0026 dt) {\n  std::cout \u003c\u003c \"action!\" \u003c\u003c std::endl;\n};\n\nparser[\"RULE\"].leave = [](const Context \u0026c, const char* s, size_t n, size_t matchlen, any\u0026 value, any\u0026 dt) {\n  std::cout \u003c\u003c \"leave\" \u003c\u003c std::endl;\n};\n```\n\nYou can receive error information via a logger:\n\n```cpp\nparser.set_logger([](size_t line, size_t col, const string\u0026 msg) {\n  ...\n});\n\nparser.set_logger([](size_t line, size_t col, const string\u0026 msg, const string \u0026rule) {\n  ...\n});\n```\n\nIgnoring Whitespaces\n--------------------\n\nAs you can see in the first example, we can ignore whitespaces between tokens automatically with `%whitespace` rule.\n\n`%whitespace` rule can be applied to the following three conditions:\n\n* trailing spaces on tokens\n* leading spaces on text\n* trailing spaces on literal strings in rules\n\nThese are valid tokens:\n\n```\nKEYWORD   \u003c- 'keyword'\nKEYWORDI  \u003c- 'case_insensitive_keyword'\nWORD      \u003c-  \u003c [a-zA-Z0-9] [a-zA-Z0-9-_]* \u003e    # token boundary operator is used.\nIDNET     \u003c-  \u003c IDENT_START_CHAR IDENT_CHAR* \u003e  # token boundary operator is used.\n```\n\nThe following grammar accepts ` one, \"two three\", four `.\n\n```\nROOT         \u003c- ITEM (',' ITEM)*\nITEM         \u003c- WORD / PHRASE\nWORD         \u003c- \u003c [a-z]+ \u003e\nPHRASE       \u003c- \u003c '\"' (!'\"' .)* '\"' \u003e\n\n%whitespace  \u003c-  [ \\t\\r\\n]*\n```\n\nWord expression\n---------------\n\n```cpp\npeg::parser parser(R\"(\n  ROOT         \u003c-  'hello' 'world'\n  %whitespace  \u003c-  [ \\t\\r\\n]*\n  %word        \u003c-  [a-z]+\n)\");\n\nparser.parse(\"hello world\"); // OK\nparser.parse(\"helloworld\");  // NG\n```\n\nCapture/Backreference\n---------------------\n\n```cpp\npeg::parser parser(R\"(\n  ROOT      \u003c- CONTENT\n  CONTENT   \u003c- (ELEMENT / TEXT)*\n  ELEMENT   \u003c- $(STAG CONTENT ETAG)\n  STAG      \u003c- '\u003c' $tag\u003c TAG_NAME \u003e '\u003e'\n  ETAG      \u003c- '\u003c/' $tag '\u003e'\n  TAG_NAME  \u003c- 'b' / 'u'\n  TEXT      \u003c- TEXT_DATA\n  TEXT_DATA \u003c- ![\u003c] .\n)\");\n\nparser.parse(\"This is \u003cb\u003ea \u003cu\u003etest\u003c/u\u003e text\u003c/b\u003e.\"); // OK\nparser.parse(\"This is \u003cb\u003ea \u003cu\u003etest\u003c/b\u003e text\u003c/u\u003e.\"); // NG\nparser.parse(\"This is \u003cb\u003ea \u003cu\u003etest text\u003c/b\u003e.\");     // NG\n```\n\nDictionary\n----------\n\n`|` operator allows us to make a word dictionary for fast lookup by using Trie structure internally. We don't have to worry about the order of words.\n\n```peg\nSTART \u003c- 'This month is ' MONTH '.'\nMONTH \u003c- 'Jan' | 'January' | 'Feb' | 'February' | '...'\n```\n\nWe are able to find which item is matched with `choice()`.\n\n```cpp\nparser[\"MONTH\"] = [](const SemanticValues \u0026vs) {\n  auto id = vs.choice();\n};\n```\n\nIt supports the case-insensitive mode.\n\n```peg\nSTART \u003c- 'This month is ' MONTH '.'\nMONTH \u003c- 'Jan'i | 'January'i | 'Feb'i | 'February'i | '...'i\n```\n\nCut operator\n------------\n\n`↑` operator could mitigate the backtrack performance problem, but has a risk to change the meaning of grammar.\n\n```peg\nS \u003c- '(' ↑ P ')' / '\"' ↑ P '\"' / P\nP \u003c- 'a' / 'b' / 'c'\n```\n\nWhen we parse `(z` with the above grammar, we don't have to backtrack in `S` after `(` is matched, because a cut operator is inserted there.\n\nParameterized Rule or Macro\n---------------------------\n\n```peg\n# Syntax\nStart      ← _ Expr\nExpr       ← Sum\nSum        ← List(Product, SumOpe)\nProduct    ← List(Value, ProOpe)\nValue      ← Number / T('(') Expr T(')')\n\n# Token\nSumOpe     ← T('+' / '-')\nProOpe     ← T('*' / '/')\nNumber     ← T([0-9]+)\n~_         ← [ \\t\\r\\n]*\n\n# Macro\nList(I, D) ← I (D I)*\nT(x)       ← \u003c x \u003e _\n```\n\nParsing infix expression by Precedence climbing\n-----------------------------------------------\n\nRegarding the *precedence climbing algorithm*, please see [this article](https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing).\n\n```cpp\nparser parser(R\"(\n  EXPRESSION             \u003c-  INFIX_EXPRESSION(ATOM, OPERATOR)\n  ATOM                   \u003c-  NUMBER / '(' EXPRESSION ')'\n  OPERATOR               \u003c-  \u003c [-+/*] \u003e\n  NUMBER                 \u003c-  \u003c '-'? [0-9]+ \u003e\n  %whitespace            \u003c-  [ \\t]*\n\n  # Declare order of precedence\n  INFIX_EXPRESSION(A, O) \u003c-  A (O A)* {\n    precedence\n      L + -\n      L * /\n  }\n)\");\n\nparser[\"INFIX_EXPRESSION\"] = [](const SemanticValues\u0026 vs) -\u003e long {\n  auto result = any_cast\u003clong\u003e(vs[0]);\n  if (vs.size() \u003e 1) {\n    auto ope = any_cast\u003cchar\u003e(vs[1]);\n    auto num = any_cast\u003clong\u003e(vs[2]);\n    switch (ope) {\n      case '+': result += num; break;\n      case '-': result -= num; break;\n      case '*': result *= num; break;\n      case '/': result /= num; break;\n    }\n  }\n  return result;\n};\nparser[\"OPERATOR\"] = [](const SemanticValues\u0026 vs) { return *vs.sv(); };\nparser[\"NUMBER\"] = [](const SemanticValues\u0026 vs) { return vs.token_to_number\u003clong\u003e(); };\n\nlong val;\nparser.parse(\" -1 + (1 + 2) * 3 - -1\", val);\nassert(val == 9);\n```\n\n*precedence* instruction can be applied only to the following 'list' style rule.\n\n```\nRule \u003c- Atom (Operator Atom)* {\n  precedence\n    L - +\n    L / *\n    R ^\n}\n```\n\n*precedence* instruction contains precedence info entries. Each entry starts with *associativity* which is 'L' (left) or 'R' (right), then operator *literal* tokens follow. The first entry has the highest order level.\n\nLeft Recursive Grammars\n-----------------------\n\ncpp-peglib supports left recursive rules, which are commonly used in expression grammars to achieve left-associative operators naturally. Left recursion is automatically detected at grammar compile time and handled via a seed-growing algorithm at parse time.\n\n```cpp\nparser parser(R\"(\n  Expr   \u003c- Expr '+' Term / Expr '-' Term / Term\n  Term   \u003c- Term '*' Factor / Term '/' Factor / Factor\n  Factor \u003c- '(' Expr ')' / Number\n  Number \u003c- \u003c [0-9]+ \u003e\n  %whitespace \u003c- [ \\t]*\n)\");\n\nparser[\"Expr\"] = [](const SemanticValues \u0026vs) {\n  switch (vs.choice()) {\n  case 0: return any_cast\u003clong\u003e(vs[0]) + any_cast\u003clong\u003e(vs[1]);\n  case 1: return any_cast\u003clong\u003e(vs[0]) - any_cast\u003clong\u003e(vs[1]);\n  default: return any_cast\u003clong\u003e(vs[0]);\n  }\n};\n\nparser[\"Term\"] = [](const SemanticValues \u0026vs) {\n  switch (vs.choice()) {\n  case 0: return any_cast\u003clong\u003e(vs[0]) * any_cast\u003clong\u003e(vs[1]);\n  case 1: return any_cast\u003clong\u003e(vs[0]) / any_cast\u003clong\u003e(vs[1]);\n  default: return any_cast\u003clong\u003e(vs[0]);\n  }\n};\n\nparser[\"Number\"] = [](const SemanticValues \u0026vs) {\n  return vs.token_to_number\u003clong\u003e();\n};\n\nlong val;\nparser.parse(\"1 - 2 - 3\", val);\nassert(val == -4);  // Left-associative: (1-2)-3 = -4\n\nparser.parse(\"8 / 4 / 2\", val);\nassert(val == 1);   // Left-associative: (8/4)/2 = 1\n```\n\nDirect, indirect, and mutual left recursion are all supported. For example, indirect left recursion works as expected:\n\n```peg\nA \u003c- B 'a'\nB \u003c- A 'b' / 'b'\n```\n\nLeft recursion support is enabled by default and adds zero overhead to non-left-recursive grammars. To disable it (reverting to the traditional error on left-recursive rules), call `enable_left_recursion(false)` before loading the grammar:\n\n```cpp\npeg::parser parser;\nparser.enable_left_recursion(false);\nparser.load_grammar(grammar);\n```\n\nAST generation\n--------------\n\n*cpp-peglib* is able to generate an AST (Abstract Syntax Tree) when parsing. `enable_ast` method on `peg::parser` class enables the feature.\n\nNOTE: An AST node holds a corresponding token as `std::string_vew` for performance and less memory usage. It is users' responsibility to keep the original source text along with the generated AST tree.\n\n```\npeg::parser parser(R\"(\n  ...\n  definition1 \u003c- ... { no_ast_opt }\n  definition2 \u003c- ... { no_ast_opt }\n  ...\n)\");\n\nparser.enable_ast();\n\nshared_ptr\u003cpeg::Ast\u003e ast;\nif (parser.parse(\"...\", ast)) {\n  cout \u003c\u003c peg::ast_to_s(ast);\n\n  ast = parser.optimize_ast(ast);\n  cout \u003c\u003c peg::ast_to_s(ast);\n}\n```\n\n`optimize_ast` removes redundant nodes to make an AST simpler. If you want to disable this behavior from particular rules, `no_ast_opt` instruction can be used.\n\nIt internally calls `peg::AstOptimizer` to do the job. You can make your own AST optimizers to fit your needs.\n\nSee actual usages in the [AST calculator example](https://github.com/yhirose/cpp-peglib/blob/master/example/calc3.cc) and [PL/0 language example](https://github.com/yhirose/cpp-peglib/blob/master/pl0/pl0.cc).\n\nMake a parser with parser combinators\n-------------------------------------\n\nInstead of making a parser by parsing PEG syntax text, we can also construct a parser by hand with *parser combinators*. Here is an example:\n\n```cpp\nusing namespace peg;\nusing namespace std;\n\nvector\u003cstring\u003e tags;\n\nDefinition ROOT, TAG_NAME, _;\nROOT     \u003c= seq(_, zom(seq(chr('['), TAG_NAME, chr(']'), _)));\nTAG_NAME \u003c= oom(seq(npd(chr(']')), dot())), [\u0026](const SemanticValues\u0026 vs) {\n              tags.push_back(vs.token_to_string());\n            };\n_        \u003c= zom(cls(\" \\t\"));\n\nauto ret = ROOT.parse(\" [tag1] [tag:2] [tag-3] \");\n```\n\nThe following are available operators:\n\n| Operator | Description                     | Operator | Description         |\n|:---------|:--------------------------------|:---------|:--------------------|\n| seq      | Sequence                        | cho      | Prioritized Choice  |\n| zom      | Zero or More                    | oom      | One or More         |\n| opt      | Optional                        | apd      | And predicate       |\n| npd      | Not predicate                   | lit      | Literal string      |\n| liti     | Case-insensitive Literal string | cls      | Character class     |\n| ncls     | Negated Character class         | chr      | Character           |\n| dot      | Any character                   | tok      | Token boundary      |\n| ign      | Ignore semantic value           | csc      | Capture scope       |\n| cap      | Capture                         | bkr      | Back reference      |\n| dic      | Dictionary                      | pre      | Infix expression    |\n| rec      | Infix expression                | usr      | User defined parser |\n| rep      | Repetition                      |          |                     |\n\nAdjust definitions\n------------------\n\nIt's possible to add/override definitions.\n\n```cpp\nauto syntax = R\"(\n  ROOT \u003c- _ 'Hello' _ NAME '!' _\n)\";\n\nRules additional_rules = {\n  {\n    \"NAME\", usr([](const char* s, size_t n, SemanticValues\u0026 vs, any\u0026 dt) -\u003e size_t {\n      static vector\u003cstring\u003e names = { \"PEG\", \"BNF\" };\n      for (const auto\u0026 name: names) {\n        if (name.size() \u003c= n \u0026\u0026 !name.compare(0, name.size(), s, name.size())) {\n          return name.size(); // processed length\n        }\n      }\n      return -1; // parse error\n    })\n  },\n  {\n    \"~_\", zom(cls(\" \\t\\r\\n\"))\n  }\n};\n\nauto g = parser(syntax, additional_rules);\n\nassert(g.parse(\" Hello BNF! \"));\n```\n\nUnicode support\n---------------\n\ncpp-peglib accepts UTF8 text. `.` matches a Unicode codepoint. Also, it supports `\\u????`.\n\nError report and recovery\n-------------------------\n\ncpp-peglib supports the furthest failure error position report as described in the Bryan Ford original document.\n\nFor better error report and recovery, cpp-peglib supports 'recovery' operator with label which can be associated with a recovery expression and a custom error message. This idea comes from the fantastic [\"Syntax Error Recovery in Parsing Expression Grammars\"](https://arxiv.org/pdf/1806.11150.pdf) paper by Sergio Medeiros and Fabio Mascarenhas.\n\nThe custom message supports `%t` which is a placeholder for the unexpected token, and `%c` for the unexpected Unicode char.\n\nHere is an example of Java-like grammar:\n\n```peg\n# java.peg\nProg        ← 'public' 'class' NAME '{' 'public' 'static' 'void' 'main' '(' 'String' '[' ']' NAME ')' BlockStmt '}'\nBlockStmt   ← '{' (!'}' Stmt^stmtb)* '}' # Annotated with `stmtb`\nStmt        ← IfStmt / WhileStmt / PrintStmt / DecStmt / AssignStmt / BlockStmt\nIfStmt      ← 'if' '(' Exp ')' Stmt ('else' Stmt)?\nWhileStmt   ← 'while' '(' Exp^condw ')' Stmt # Annotated with `condw`\nDecStmt     ← 'int' NAME ('=' Exp)? ';'\nAssignStmt  ← NAME '=' Exp ';'^semia # Annotated with `semi`\nPrintStmt   ← 'System.out.println' '(' Exp ')' ';'\nExp         ← RelExp ('==' RelExp)*\nRelExp      ← AddExp ('\u003c' AddExp)*\nAddExp      ← MulExp (('+' / '-') MulExp)*\nMulExp      ← AtomExp (('*' / '/') AtomExp)*\nAtomExp     ← '(' Exp ')' / NUMBER / NAME\n\nNUMBER      ← \u003c [0-9]+ \u003e\nNAME        ← \u003c [a-zA-Z_][a-zA-Z_0-9]* \u003e\n\n%whitespace ← [ \\t\\n]*\n%word       ← NAME\n\n# Recovery operator labels\nsemia       ← '' { error_message \"missing semicolon in assignment.\" }\nstmtb       ← (!(Stmt / 'else' / '}') .)* { error_message \"invalid statement\" }\ncondw       ← \u0026'==' ('==' RelExp)* / \u0026'\u003c' ('\u003c' AddExp)* / (!')' .)*\n```\n\nFor instance, `';'^semi` is a syntactic sugar for `(';' / %recovery(semi))`. `%recover` operator tries to recover the error at ';' by skipping input text with the recovery expression `semi`. Also `semi` is associated with a custom message \"missing semicolon in assignment.\"\n\nHere is the result:\n\n```java\n\u003e cat sample.java\npublic class Example {\n  public static void main(String[] args) {\n    int n = 5;\n    int f = 1;\n    while( \u003c n) {\n      f = f * n;\n      n = n - 1\n    };\n    System.out.println(f);\n  }\n}\n\n\u003e peglint java.peg sample.java\nsample.java:5:12: syntax error, unexpected '\u003c', expecting '(', \u003cNUMBER\u003e, \u003cNAME\u003e.\nsample.java:8:5: missing semicolon in assignment.\nsample.java:8:6: invalid statement\n```\n\nAs you can see, it can now show more than one error, and provide more meaningful error messages than the default messages.\n\n### Custom error message for definitions\n\nWe can associate custom error messages to definitions.\n\n```peg\n# custom_message.peg\nSTART       \u003c- CODE (',' CODE)*\nCODE        \u003c- \u003c '0x' [a-fA-F0-9]+ \u003e { error_message 'code format error...' }\n%whitespace \u003c- [ \\t]*\n```\n\n```\n\u003e cat custom_message.txt\n0x1234,0x@@@@,0xABCD\n\n\u003e peglint custom_message.peg custom_message.txt\ncustom_message.txt:1:8: code format error...\n```\n\nNOTE: If there is more than one element with an error message instruction in a prioritized choice, this feature may not work as you expect.\n\nChange the Start Definition Rule\n--------------------------------\n\nWe can change the start definition rule as below.\n\n```cpp\nauto grammar = R\"(\n  Start       \u003c- A\n  A           \u003c- B (',' B)*\n  B           \u003c- '[one]' / '[two]'\n  %whitespace \u003c- [ \\t\\n]*\n)\";\n\npeg::parser parser(grammar, \"A\"); // Start Rule is \"A\"\n\n  or\n\npeg::parser parser;\nparser.load_grammar(grammar, \"A\"); // Start Rule is \"A\"\n\nparser.parse(\" [one] , [two] \"); // OK\n```\n\npeglint - PEG syntax lint utility\n---------------------------------\n\n### Build peglint\n\n```\n\u003e cd lint\n\u003e mkdir build\n\u003e cd build\n\u003e cmake ..\n\u003e make\n\u003e ./peglint\nusage: grammar_file_path [source_file_path]\n\n  options:\n    --source: source text\n    --packrat: enable packrat memoise\n    --ast: show AST tree\n    --opt, --opt-all: optimize all AST nodes except nodes selected with `no_ast_opt` instruction\n    --opt-only: optimize only AST nodes selected with `no_ast_opt` instruction\n    --trace: show concise trace messages\n    --profile: show profile report\n    --verbose: verbose output for trace and profile\n```\n\n### Grammar check\n\n```\n\u003e cat a.peg\nAdditive    \u003c- Multiplicative '+' Additive / Multiplicative\nMultiplicative   \u003c- Primary '*' Multiplicative / Primary\nPrimary     \u003c- '(' Additive ')' / Number\n%whitespace \u003c- [ \\t\\r\\n]*\n\n\u003e peglint a.peg\n[commandline]:3:35: 'Number' is not defined.\n```\n\n### Source check\n\n```\n\u003e cat a.peg\nAdditive    \u003c- Multiplicative '+' Additive / Multiplicative\nMultiplicative   \u003c- Primary '*' Multiplicative / Primary\nPrimary     \u003c- '(' Additive ')' / Number\nNumber      \u003c- \u003c [0-9]+ \u003e\n%whitespace \u003c- [ \\t\\r\\n]*\n\n\u003e peglint --source \"1 + a * 3\" a.peg\n[commandline]:1:3: syntax error\n```\n\n### AST\n\n```\n\u003e cat a.txt\n1 + 2 * 3\n\n\u003e peglint --ast a.peg a.txt\n+ Additive\n  + Multiplicative\n    + Primary\n      - Number (1)\n  + Additive\n    + Multiplicative\n      + Primary\n        - Number (2)\n      + Multiplicative\n        + Primary\n          - Number (3)\n```\n\n### AST optimization\n\n```\n\u003e peglint --ast --opt --source \"1 + 2 * 3\" a.peg\n+ Additive\n  - Multiplicative[Number] (1)\n  + Additive[Multiplicative]\n    - Primary[Number] (2)\n    - Multiplicative[Number] (3)\n```\n\n### Adjust AST optimization with `no_ast_opt` instruction\n\n```\n\u003e cat a.peg\nAdditive    \u003c- Multiplicative '+' Additive / Multiplicative\nMultiplicative   \u003c- Primary '*' Multiplicative / Primary\nPrimary     \u003c- '(' Additive ')' / Number          { no_ast_opt }\nNumber      \u003c- \u003c [0-9]+ \u003e\n%whitespace \u003c- [ \\t\\r\\n]*\n\n\u003e peglint --ast --opt --source \"1 + 2 * 3\" a.peg\n+ Additive/0\n  + Multiplicative/1[Primary]\n    - Number (1)\n  + Additive/1[Multiplicative]\n    + Primary/1\n      - Number (2)\n    + Multiplicative/1[Primary]\n      - Number (3)\n\n\u003e peglint --ast --opt-only --source \"1 + 2 * 3\" a.peg\n+ Additive/0\n  + Multiplicative/1\n    - Primary/1[Number] (1)\n  + Additive/1\n    + Multiplicative/0\n      - Primary/1[Number] (2)\n      + Multiplicative/1\n        - Primary/1[Number] (3)\n```\n\nSample codes\n------------\n\n* [Calculator](https://github.com/yhirose/cpp-peglib/blob/master/example/calc.cc)\n* [Calculator (with parser operators)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc2.cc)\n* [Calculator (AST version)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc3.cc)\n* [Calculator (parsing expressions by precedence climbing)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc4.cc)\n* [Calculator (AST version and parsing expressions by precedence climbing)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc5.cc)\n* [A tiny PL/0 JIT compiler in less than 900 LOC with LLVM and PEG parser](https://github.com/yhirose/pl0-jit-compiler)\n* [A Programming Language just for writing Fizz Buzz program. :)](https://github.com/yhirose/fizzbuzzlang)\n\nLicense\n-------\n\nMIT license (© 2022 Yuji Hirose)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-peglib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyhirose%2Fcpp-peglib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-peglib/lists"}