{"id":30051976,"url":"https://github.com/jwtowner/lug","last_synced_at":"2025-08-07T16:34:33.973Z","repository":{"id":83873655,"uuid":"87805661","full_name":"jwtowner/lug","owner":"jwtowner","description":"C++ embedded domain specific language for extended parsing expression grammars (PEGs)","archived":false,"fork":false,"pushed_at":"2025-04-16T10:42:34.000Z","size":9865,"stargazers_count":80,"open_issues_count":10,"forks_count":6,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-06-04T17:11:37.335Z","etag":null,"topics":["c-plus-plus","c-plus-plus-17","context-sensitive-grammars","cpp","cpp17","dsl","grammar","parser-combinators","parser-generator","parsing","parsing-expression-grammars","parsing-machine","peg"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jwtowner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-10T12:00:13.000Z","updated_at":"2025-05-11T00:32:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"1fa824aa-82e2-4f5a-86e0-38e6c12f2038","html_url":"https://github.com/jwtowner/lug","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/jwtowner/lug","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwtowner%2Flug","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwtowner%2Flug/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwtowner%2Flug/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwtowner%2Flug/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jwtowner","download_url":"https://codeload.github.com/jwtowner/lug/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwtowner%2Flug/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269291052,"owners_count":24392376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-07T02:00:09.698Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","c-plus-plus-17","context-sensitive-grammars","cpp","cpp17","dsl","grammar","parser-combinators","parser-generator","parsing","parsing-expression-grammars","parsing-machine","peg"],"created_at":"2025-08-07T16:33:00.023Z","updated_at":"2025-08-07T16:34:33.958Z","avatar_url":"https://github.com/jwtowner.png","language":"C++","funding_links":[],"categories":["Parsing Expression Grammars"],"sub_categories":[],"readme":"lug\n[![Build Status](https://github.com/jwtowner/lug/actions/workflows/c-cpp.yml/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/c-cpp.yml)\n[![CodeQL](https://github.com/jwtowner/lug/actions/workflows/dynamic/github-code-scanning/codeql/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/dynamic/github-code-scanning/codeql)\n[![Analyze](https://github.com/jwtowner/lug/actions/workflows/analyze.yml/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/analyze.yml)\n[![Sanitize](https://github.com/jwtowner/lug/actions/workflows/sanitize.yml/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/sanitize.yml)\n[![Tidy](https://github.com/jwtowner/lug/actions/workflows/tidy.yml/badge.svg)](https://github.com/jwtowner/lug/actions/workflows/tidy.yml)\n[![License](https://img.shields.io/packagist/l/doctrine/orm.svg)](https://github.com/jwtowner/lug/blob/master/LICENSE.md)\n===\nA C++ embedded domain specific language for expressing parsers as extended [parsing expression grammars (PEGs)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)\n\n![lug](https://github.com/jwtowner/lug/raw/master/doc/lug_logo_large.png)\n\nFeatures\n---\n- Natural syntax resembling external parser generator languages, with support for attributes and semantic actions.\n- Ability to handle context-sensitive grammars with symbol tables, conditions and syntactic predicates.\n- Generated parsers are compiled to special-purpose bytecode and executed in a virtual parsing machine.\n- Clear separation of syntactic and lexical rules, with the ability to customize implicit whitespace skipping.\n- Support for direct and indirect left recursion, with precedence levels to disambiguate subexpressions with mixed left/right recursion.\n- Full support for UTF-8 text parsing, including Level 1 and partial Level 2 compliance with the UTS #18 Unicode Regular Expressions technical standard.\n- Error handling and recovery with labeled failures, recovery rules and error handlers.\n- Automatic tracking of line and column numbers, with customizable tab width and alignment.\n- Header-only library utilizing C++17 language and library features. Forward compatible with C++20 and C++23.\n- Relatively small with the goal of keeping total line count across all header files under 6000 lines of terse code.\n\nIt is based on research introduced in the following papers:\n\n\u003e Bryan Ford, [Parsing expression grammars: a recognition-based syntactic foundation](https://doi.org/10.1145/982962.964011), Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, p.111-122, January 2004\n\n\u003e Sérgio Medeiros et. al, [A parsing machine for PEGs](https://doi.org/10.1145/1408681.1408683), Proceedings of the 2008 symposium on Dynamic Languages, p.1-12, July 2008\n\n\u003e Kota Mizushima et. al, [Packrat parsers can handle practical grammars in mostly constant space](https://doi.org/10.1145/1806672.1806679), Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, p.29-36, June 2010\n\n\u003e Sérgio Medeiros et. al, [Left recursion in Parsing Expression Grammars](https://doi.org/10.1016/j.scico.2014.01.013), Science of Computer Programming, v.96 n.P2, p.177-190, December 2014\n\n\u003e Leonardo Reis et. al, [The formalization and implementation of Adaptable Parsing Expression Grammars](https://doi.org/10.1016/j.scico.2014.02.020), Science of Computer Programming, v.96 n.P2, p.191-210, December 2014\n\n\u003e Tetsuro Matsumura, Kimio Kuramitsu, [A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages](https://doi.org/10.2197/ipsjjip.24.256), Journal of Information Processing, v.24 i.2, p.256-264, November 2015\n\n\u003e Sérgio Medeiros et. al, [A parsing machine for parsing expression grammars with labeled failures](https://doi.org/10.1145/2851613.2851750), Proceedings of the 31st Annual ACM symposium on Applied Computing, p.1960-1967, April 2016\n\nBuilding\n---\nAs a self-contained header-only library, lug itself does not require any build process.\nTo use lug, make sure to include the `lug` header directory in your project's include path.\nOnce that is done, you are ready to start using lug in your code.\nTo build the sample programs and unit tests both [CMake](https://cmake.org/) and [make](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html) are supported.\n\nAs a baseline, the following compiler versions are known to work with lug.\n\n| Compiler | Minimum Language Mode |\n| --- | --- |\n| Clang 14.0.0 (March 2022) or later | -std=c++17 or -std=gnu++17 |\n| GCC 9.5 (May 2022) or later | -std=c++17 or -std=gnu++17 |\n| Microsoft Visual C++ 2019 16.11 (August 2021) or later | Platform Toolset: Visual Studio 2019 Toolset (v142), Language Standard: ISO C++17 Standard (/std:c++17) |\n\nDemonstration\n---\nThe following example demonstrates an arithmetic expression evaluator supporting addition and multiplication.\n\n`demo.cpp`\n```cpp\n// Include the lug library header file\n#include \u003clug/lug.hpp\u003e\n\n// Needed for std::cout\n#include \u003ciostream\u003e\n\nint main()\n{\n    // Import the namespace containing the embedded DSL operators and types\n    using namespace lug::language;\n\n    // Define attribute variables for the recursive rules\n    int lhs = 0;\n    int rhs = 0;\n\n    // Define a lexical rule that matches one or more digits and converts them to an integer\n    auto Number = lexeme[+digit] \u003c[](syntax s){ return std::stoi(std::string{s.str()}); };\n\n    // Forward declaration for recursive rules\n    rule Expr;\n\n    // Define a rule that matches a number or a parenthesized expression\n    rule Factor = Number | ('(' \u003e Expr \u003e ')');\n\n    // Define a rule that multiplies the factors\n    rule Term = lhs%Factor \u003e *('*' \u003e rhs%Factor \u003c[\u0026]{ lhs *= rhs; }) \u003c[\u0026]{ return lhs; };\n\n    // Define a rule that adds the terms\n    Expr = lhs%Term \u003e *('+' \u003e rhs%Term \u003c[\u0026]{ lhs += rhs; }) \u003c[\u0026]{ return lhs; };\n\n    // Create grammar that matches an arithmetic expression followed by end-of-input\n    auto grammar = start(Expr \u003e eoi);\n\n    // Sample input string to parse\n    std::string input = \"2 * (3 + 4)\";\n\n    // Parse and evaluate the sample input\n    lug::environment env;\n    if (!lug::parse(input, grammar, env)) {\n        std::cout \u003c\u003c \"parse failed\\n\";\n        return 1;\n    }\n\n    // Pop the result from the environment and display it to the console\n    int result = env.pop_attribute\u003cint\u003e();\n    std::cout \u003c\u003c input \u003c\u003c \" = \" \u003c\u003c result \u003c\u003c \"\\n\"; // Outputs: 2 * (3 + 4) = 14\n    return 0;\n}\n```\n\nTo compile the demonstration with GCC, save the code above to a file named `demo.cpp` and use the following command,\nmaking sure to substitute `\u003cpath-to-lug\u003e` with the location of `lug` on your filesystem:\n\n`g++ -std=c++17 -I\u003cpath-to-lug\u003e -o demo demo.cpp`\n\nThen run the demonstration executable with the following command:\n\n`./demo`\n\nYou should see the output:\n\n`2 * (3 + 4) = 14`\n\nIn summary, the above example demonstrates:\n- Lexical rules with semantic actions to convert matched text into values.\n- Recursive grammar rules for handling nested expressions.\n- Operator precedence through hierarchical rule structure (multiplication before addition).\n- Attribute capture and propagation for expression evaluation.\n- Environment management for storing and retrieving parsed results.\n\nQuick Reference\n---\n\n| Operator | Syntax | Description |\n| --- | --- | --- |\n| Ordered Choice | `e1 \\| e2` | Attempts to first match expression *e1*, and if that fails backtracks then attempts to match *e2*. |\n| Sequence | `e1 \u003e e2` | Matches both expressions *e1* followed by *e2* in sequence. |\n| List | `e1 \u003e\u003e e2` | Repetition matching of a sequence of one or more *e1* expressions delimited by *e2*. Shorthand for `e1 \u003e *(e2 \u003e e1)`. |\n| Zero-or-More | `*e` | Repetition matching of expression *e* zero, one or more times. |\n| One-or-More | `+e` | Repetition matching of expression *e* one or more times. |\n| Optional | `~e` | Matches expression *e* zero or one times. |\n| Positive Lookahead | `\u0026e` | Matches without consuming input if expression *e* succeeds to match the input. |\n| Negative Lookahead | `!e` | Matches without consuming input if expression *e* fails to match the input. |\n| Cut Before | `--e` | Issues a cut instruction before the expression *e*. |\n| Cut After | `e--` | Issues a cut instruction after the expression *e*. |\n| Action Scheduling | `e \u003c a` | Schedules a semantic action *a* to be evaluated if expression *e* successfully matches the input. |\n| Attribute Binding | `v % e` | Assigns the return value of the last evaluated semantic action within the expression *e* to the variable *v*. |\n| Error Handler | `e ^= [⁠]⁠(⁠error_context\u0026⁠)⁠{⁠}` | Associates the error handler callable with expression *e*. |\n| Error Response | `e ^ error_response` | Returns the specified `error_response` enumeration value for a recovery rule expression *e*. |\n| Recover With | `e[recover_with(r)]` | Installs rule *r* as the default for error recovery for failures in expression *e*. |\n| Expects | `e[failure(f)]` | Expects that expression *e* will successfully match, otherwise raises the labeled failure *f*. |\n| Expects | `e[failure(f,r)]` | Expects that expression *e* will successfully match, otherwise raises the labeled failure *f* and recovers with rule *r*. |\n\n| Control Directive | Description |\n| --- | --- |\n| `capture(v)⁠[e]` | Syntactic capture of the text matching the subexpression *e* into variable *v*. |\n| `cased⁠[e]` | Case sensitive matching for the subexpression *e* (the default). |\n| `caseless⁠[e]` | Case insensitive matching for subexpression *e*. |\n| `skip⁠[e]` | Turns on all whitespace skipping for subexpression *e* (the default). |\n| `noskip⁠[e]` | Turns off all whitespace skipping for subexpression *e*, including preceeding whitespace. |\n| `lexeme⁠[e]` | Treats subexpression *e* as a lexical token with no internal whitespace skipping. |\n| `repeat(N)⁠[e]` | Matches exactly *N* occurences of expression *e*. |\n| `repeat(N,M)⁠[e]` | Matches at least *N* and at most *M* occurences of expression *e*. |\n| `on(C)⁠[e]` | Sets the condition *C* to true for the scope of subexpression *e*. |\n| `off(C)⁠[e]` | Sets the condition *C* to false for the scope of subexpression *e* (the default). |\n| `symbol(S)⁠[e]` | Pushes a symbol definition for symbol *S* with value equal to the captured input matching subexpression *e*. |\n| `block⁠[e]` | Creates a scope block for subexpression *e* where all new symbols defined in *e* are local to it and all external symbols defined outside of the block are also available for reference within *e*. |\n| `local⁠[e]` | Creates a local scope block for subexpression *e* where all new symbols defined in *e* are local to it and there are no external symbol definitions available for reference. |\n| `local(S)⁠[e]` | Creates a local scope block for subexpression *e* where all new symbols defined in *e* are local to it and all external symbols defined outside of the block are also available for reference within *e*, except for the symbol named *S*. |\n| `collect\u003cC\u003e⁠[e]` | Synthesizes a collection attribute of container type *C* from the attributes inherited from or synthesized within expression *e*. |\n| `collect\u003cC,A...\u003e⁠[e]` | Synthesizes a collection attribute of container type *C* consisting of elements, each of which are constructed from sequences of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |\n| `synthesize\u003cT,A...\u003e⁠[e]` | Synthesizes an object of type *T* constructed from a sequence of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |\n| `synthesize_shared\u003cT\u003e⁠[e]` | Synthesizes an object of type `std::shared_ptr\u003cT\u003e` by calling `std::make_shared` passing in an attribute of type *T* inherited from or synthesized within expression *e*. |\n| `synthesize_shared\u003cT,A...\u003e⁠[e]` | Synthesizes an object of type `std::shared_ptr\u003cT\u003e` by calling `std::make_shared` passing in a sequence of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |\n| `synthesize_unique\u003cT\u003e⁠[e]` | Synthesizes an object of type `std::unique_ptr\u003cT\u003e` by calling `std::make_unique` passing in an attribute of type *T* inherited from or synthesized within expression *e*. |\n| `synthesize_unique\u003cT,A...\u003e⁠[e]` | Synthesizes an object of type `std::unique_ptr\u003cT\u003e` by calling `std::make_unique` passing in a sequence of attributes inherited from or synthesized within expression *e* and that match the types of parameter pack *A...*. |\n\n| Factory | Description |\n| --- | --- |\n| `sync(p)` | Makes a recovery rule expression that synchronizes the token string until it finds pattern *p* and returns `error_response::resume`. |\n| `sync\u003cr\u003e(p)` | Makes a recovery rule expression that synchronizes the token string until it finds pattern *p* and returns `error_response` enumerator value *r*. |\n| `sync_with_value(p,v)` | Makes a recovery rule expression that synchronizes the token string until it finds pattern *p*, emits the value *v* into the attribute stack and returns `error_response::resume`. |\n| `sync_with_value\u003cr\u003e(p,v)` | Makes a recovery rule expression that synchronizes the token string until it finds pattern *p*, emits the value *v* into the attribute stack and returns `error_response` enumerator value *r*. |\n| `with_value(v)` | Makes a recovery rule expression that emits the value *v* into the attribute stack and returns `error_response::resume`. |\n| `with_value\u003cr\u003e(v)` | Makes a recovery rule expression that emits the value *v* into the attribute stack and returns `error_response` enumerator value *r*. |\n| `with_response\u003cr\u003e()` | Makes a recovery rule expression that returns `error_response` enumerator value *r*. |\n\n| Terminal | Description |\n| --- | --- |\n| `nop` | No operation, does not emit any instructions. |\n| `eps` | Matches the empty string. |\n| `eoi` | Matches the end of the input sequence. |\n| `eol` | Matches a Unicode line-ending. |\n| `cut` | Emits a cut operation, accepting semantic actions up to current match prefix unless there were syntax errors, and draining the input source. |\n| `accept` | Accepts all semantic actions up to current match prefix, even after recovering from syntax errors. Does not drain the input source. |\n| `raise⁠(f)` | Raises the labeled failure *f* to be handled by the top level error handler and recovery rule. |\n| `raise⁠(f,r)` | Raises the labeled failure *f* with recovery rule *r* to be handled by the top level error handler. |\n| `chr(c)` | Matches the UTF-8, UTF-16, or UTF-32 character *c*. |\n| `chr(c1, c2)` | Matches characters in the UTF-8, UTF-16, or UTF-32 interval \\[*c1*-*c2*\\]. |\n| `str(s)` | Matches the sequence of characters in the string *s*. |\n| `bre(s)` | POSIX Basic Regular Expression (BRE). |\n| `any` | Matches any single character. |\n| `any(flags)` | Matches a character exhibiting any of the character properties. |\n| `all(flags)` | Matches a character with all of the character properties. |\n| `none(flags)` | Matches a character with none of the character properties. |\n| `alpha` | Matches any alphabetical character. |\n| `alnum` | Matches any alphabetical character or numerical digit. |\n| `blank` | Matches any space or tab character. |\n| `cntrl` | Matches any control character. |\n| `digit` | Matches any decimal digit. |\n| `graph` | Matches any graphical character. |\n| `lower` | Matches any lowercase alphabetical character. |\n| `print` | Matches any printable character. |\n| `punct` | Matches any punctuation character. |\n| `space` | Matches any whitespace character. |\n| `upper` | Matches any uppercase alphabetical character. |\n| `xdigit` | Matches any hexadecimal digit. |\n| `when⁠(C)` | Matches if the condition named *C* is *true*, without consuming input. |\n| `unless⁠(C)` | Matches if the condition named *C* is *false*, without consuming input. |\n| `exists⁠(S)` | Matches if there is a definition for symbol *S* in the current scope. |\n| `missing⁠(S)` | Matches if there is no definition for symbol *S* in the current scope. |\n| `match⁠(S)` | Matches the last definition for symbol named *S*. |\n| `match_any⁠(S)` | Matches against any prior definition for symbol named *S*. |\n| `match_all⁠(S)` | Matches against all prior definitions for symbol named *S*, in sequence from least to most recent. |\n| `match_front⁠(S,N=0)` | Matches against the *N*-th least recent definition for symbol named *S*. |\n| `match_back⁠(S,N=0)` | Matches against the *N*-th most recent definition for symbol named *S*. |\n\n| Literal | Name | Description |\n| --- | --- | --- |\n| `_cx` | Character Expression | Matches the UTF-8, UTF-16, or UTF-32 character literal |\n| `_sx` | String Expression | Matches the sequence of characters in a string literal |\n| `_rx` | Regular Expression | POSIX Basic Regular Expression (BRE) |\n| `_icx` | Case Insensitive Character Expression | Same as `_cx` but case insensitive |\n| `_isx` | Case Insensitive String Expression | Same as `_sx` but case insensitive |\n| `_irx` | Case Insensitive Regular Expression | Same as `_rx` but case insensitive |\n| `_scx` | Case Sensitive Character Expression | Same as `_cx` but case sensitive |\n| `_ssx` | Case Sensitive String Expression | Same as `_sx` but case sensitive |\n| `_srx` | Case Sensitive Regular Expression | Same as `_rx` but case sensitive |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwtowner%2Flug","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjwtowner%2Flug","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwtowner%2Flug/lists"}