{"id":19110823,"url":"https://github.com/h2co3/parsel","last_synced_at":"2025-04-09T21:16:31.269Z","repository":{"id":40641935,"uuid":"505090952","full_name":"H2CO3/parsel","owner":"H2CO3","description":"Generate parsers directly from AST node types","archived":false,"fork":false,"pushed_at":"2024-06-06T12:27:39.000Z","size":141,"stargazers_count":82,"open_issues_count":2,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-09T21:16:26.554Z","etag":null,"topics":["parser","parser-generation","parser-generator","procedural-macro","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/H2CO3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-19T11:32:44.000Z","updated_at":"2025-03-02T10:28:07.000Z","dependencies_parsed_at":"2024-04-17T10:31:15.117Z","dependency_job_id":"008ce1d7-3015-44ad-b333-dbac7982f4ff","html_url":"https://github.com/H2CO3/parsel","commit_stats":{"total_commits":37,"total_committers":1,"mean_commits":37.0,"dds":0.0,"last_synced_commit":"55a2bfa80181b693560ce15b10dae44f495c5171"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/H2CO3%2Fparsel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/H2CO3%2Fparsel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/H2CO3%2Fparsel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/H2CO3%2Fparsel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/H2CO3","download_url":"https://codeload.github.com/H2CO3/parsel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248111973,"owners_count":21049578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","parser-generation","parser-generator","procedural-macro","rust"],"created_at":"2024-11-09T04:26:11.220Z","updated_at":"2025-04-09T21:16:31.247Z","avatar_url":"https://github.com/H2CO3.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Parsel, the Zero-Code Parser Generator\n\n[![MSRV](https://img.shields.io/badge/MSRV-1.77.0-green)](https://github.com/H2CO3/Parsel)\n\nParsel is a library for generating parsers directly from syntax tree node types.\n\nThe main entry point is the [`#[derive(Parse)]`](derive.Parse.html) custom derive\nproc-macro, which generates an implementation of the `syn::parse::Parse` trait\nfor the annotated AST node type. Adding [`#[derive(FromStr)]`](derive.FromStr.html)\nalso implements the standard `FromStr` trait for the type, by simply forwarding to\nits `Parse` impl.\n\nIn addition, a [`#[derive(ToTokens)]`](derive.ToTokens.html) macro is provided,\nfor easily obtaining the source representation of a specific AST node via the\n`quote` crate. This in turn helps with getting its `Span` due to the blanket\n[`impl\u003cT: ToTokens\u003e Spanned for T`](https://docs.rs/syn/latest/syn/spanned/trait.Spanned.html#implementors).\nAdding [`#[derive(Display)]`](derive.Display.html) also implements the standard\n`Display` trait for the type, by simply forwarding to its `ToTokens` impl.\n\nFurthermore, the [`ast` module](ast/index.html) provides a number of helper types\nfor common needs, such as optional productions, repetition, parenthesization, and\ngrouping. These are mostly lightweight wrappers around parsing collections and\nparsing logic already provided by `syn`. However, some very useful `syn` types,\nsuch as `Option\u003cT: Parse\u003e` and `Punctuated`, have multiple, equally valid parses,\nso they don't implement `Parse` in order to avoid amibiguity. Parsel handles this\nambiguity at the type level, by splitting the set of valid parses into multiple,\nunambiguously parseable types.\n\n### Examples and How It Works\n\nThe fundamental idea behind Parsel is the observation that `struct`s and `enum`s\ndirectly correspond to sequences and alternation in grammars, and that they are\ncomposable: one does not need to know the exact implementation of sub-expressions\nin order to produce a parser for the current rule.\n\nAST nodes that have a `struct` type correspond to sequences: every field (whether\nnamed or numbered) will be parsed and populated one after another, in the order\nspecified in the source.\n\nAST nodes having an `enum` type correspond to alternation: their variants will be\ntried in order, and the first one that succeeds will be returned. Fields of tuple\nand struct variants are treated in the same sequential manner as `struct` fields.\n\nAccordingly, you define your grammar by specifying the fields and variants of AST\nnodes, and Parsel will generate a parser from them. Let's see what this looks like\nin the context of the parser and the printer for a simple, JSON-like language:\n\n```rust\nuse core::iter::FromIterator;\nuse core::convert::TryFrom;\nuse parsel::{Parse, ToTokens};\nuse parsel::ast::{Bracket, Brace, Punctuated, LitBool, LitInt, LitFloat, LitStr};\nuse parsel::ast::token::{Comma, Colon};\n\nmod kw {\n    parsel::custom_keyword!(null);\n}\n\n#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]\nenum Value {\n    Null(kw::null),\n    Bool(LitBool),\n    Int(LitInt),\n    Float(LitFloat),\n    Str(LitStr),\n    Array(\n        #[parsel(recursive)]\n        Bracket\u003cPunctuated\u003cValue, Comma\u003e\u003e\n    ),\n    Object(\n        #[parsel(recursive)]\n        Brace\u003cPunctuated\u003cKeyValue, Comma\u003e\u003e\n    ),\n}\n\n#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]\nstruct KeyValue {\n    key: LitStr,\n    colon: Colon,\n    value: Value,\n}\n\nlet actual: Value = parsel::parse_quote!({\n    \"key1\": \"string value\",\n    \"other key\": 318,\n    \"recursive\": [\n        1.6180,\n        2.7182,\n        3.1416,\n        null\n    ],\n    \"inner\": {\n        \"nested key\": true,\n        \"hard to write a parser\": false\n    }\n});\nlet expected = Value::Object(Brace::from(Punctuated::from_iter([\n    KeyValue {\n        key: LitStr::from(\"key1\"),\n        colon: Colon::default(),\n        value: Value::Str(LitStr::from(\"string value\")),\n    },\n    KeyValue {\n        key: LitStr::from(\"other key\"),\n        colon: Colon::default(),\n        value: Value::Int(LitInt::from(318)),\n    },\n    KeyValue {\n        key: LitStr::from(\"recursive\"),\n        colon: Colon::default(),\n        value: Value::Array(Bracket::from(Punctuated::from_iter([\n            Value::Float(LitFloat::try_from(1.6180).unwrap()),\n            Value::Float(LitFloat::try_from(2.7182).unwrap()),\n            Value::Float(LitFloat::try_from(3.1416).unwrap()),\n            Value::Null(kw::null::default()),\n        ]))),\n    },\n    KeyValue {\n        key: LitStr::from(\"inner\"),\n        colon: Colon::default(),\n        value: Value::Object(Brace::from(Punctuated::from_iter([\n            KeyValue {\n                key: LitStr::from(\"nested key\"),\n                colon: Colon::default(),\n                value: Value::Bool(LitBool::from(true)),\n            },\n            KeyValue {\n                key: LitStr::from(\"hard to write a parser\"),\n                colon: Colon::default(),\n                value: Value::Bool(LitBool::from(false)),\n            },\n        ]))),\n    },\n])));\n\nassert_eq!(actual, expected);\n```\n\n### Recursive AST Nodes and Cyclic Constraints\n\nMost useful real-world grammars are recursive, i.e., they contain productions that\nrefer to themselves directly (direct recursion) or indirectly (mutual recursion).\nThis results in AST node types that contain pointers to the same type. Even more\nimportantly, it leads to cyclic constraints in the implementations of `Parse` and\n`ToTokens`. These cyclic constraints are trivially satisfied and resolvable, but\nthe constraint solver of the Rust compiler is currently struggling with them due\nto [Issue #48214](https://github.com/rust-lang/rust/issues/48214).\n\nThus, one must break such constraint cycles when deriving the implementations of\n`Parse` and `ToTokens`. Parsel supports this use case by providing the attribute\n`#[parsel(recursive)]`, or an equivalent spelling, `#[parsel(recursive = true)]`.\nAdding this attribute to a field of a `struct` or a variant of an `enum` has the\neffect of omitting all `FieldType: Parse` and `FieldType: ToTokens` constraints\nfrom the `where` clause of the generated `Parse` and `ToTokens` impls, breaking\nthe constraint cycle, and thus allowing the code to compile.\n\nIt is sufficient to break each constraint cycle on one single type (practically\non the one that requires adding the smallest number of `#[parsel(recursive)]`\nannotations). However, if the grammar contains several self-referential cycles,\nit is necessary to break each of them. Furthermore, if breaking a cycle requires\nomitting a constraint on a type which appears in multiple fields of a `struct` or\na variant, then it is necessary to add `#[parsel(recursive)]` to **all** of those\nfields.\n\nAs an example, consider the following grammar for simple Boolean operations and\nthe accompanying comments:\n\n```rust\nuse parsel::{Parse, ToTokens};\nuse parsel::ast::{Paren, LitBool};\nuse parsel::ast::token::{Or, And, Not};\n\n#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]\nenum Expr {\n    Or {\n        lhs: Conjunction,\n        op: Or,\n        #[parsel(recursive)] // break direct recursion\n        rhs: Box\u003cExpr\u003e,\n    },\n    Conjunction(Conjunction),\n}\n\n#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]\nenum Conjunction {\n    And {\n        lhs: Term,\n        op: And,\n        #[parsel(recursive)] // break direct recursion\n        rhs: Box\u003cConjunction\u003e,\n    },\n    Term(Term),\n}\n\n#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]\nenum Term {\n    Literal(LitBool),\n    Not(\n        Not,\n        #[parsel(recursive)] // break direct recursion\n        Box\u003cTerm\u003e,\n    ),\n    Group(\n        #[parsel(recursive)] // break mutual recursion\n        Paren\u003cBox\u003cExpr\u003e\u003e\n    ),\n}\n\nlet expr: Expr = parsel::parse_str(\"true \u0026 (false | true \u0026 true) \u0026 !false\").unwrap();\n\nassert_eq!(\n    expr,\n    Expr::Conjunction(Conjunction::And {\n        lhs: Term::Literal(LitBool::from(true)),\n        op: And::default(),\n        rhs: Box::new(Conjunction::And {\n            lhs: Term::Group(Paren::from(Box::new(Expr::Or {\n                lhs: Conjunction::Term(Term::Literal(LitBool::from(false))),\n                op: Or::default(),\n                rhs: Box::new(Expr::Conjunction(Conjunction::And {\n                    lhs: Term::Literal(LitBool::from(true)),\n                    op: And::default(),\n                    rhs: Box::new(Conjunction::Term(Term::Literal(LitBool::from(true)))),\n                }))\n            }))),\n            op: And::default(),\n            rhs: Box::new(Conjunction::Term(Term::Not(\n                Not::default(),\n                Box::new(Term::Literal(LitBool::from(false))),\n            ))),\n        })\n    })\n);\n```\n\n### Dealing with Left Recursion\n\nIf you carefully examine the grammar, you can notice it's right-recursive, i.e.,\nthe subexpression with identical precedence appears on the right-hand side, while\nthe left-hand side descends one level to the next tightest-binding subexpression.\nThis in turn means that consecutive operations of equal precedence will associate\nto the right. The reason for this is that recursive descent parsers, such as the\nones generated by Parsel, fall into infinite recursion if they attempt parsing a\nleft-recursive grammar. For instance, if our top-level expression were defined as\n\n```text\nexpr = expr '|' conjunction\n     | conjunction\n```\n\nthen the code generated for `expr` would immediately and unconditionally try to\ncall itself again.\n\nWhile it is fine to rewrite the grammar as right-recursive in the case of simple\nBoolean expressions (since they are associative), it is generally not possible\nto just omit left recursion altogether from a grammar. Operations which are not\nassociative care a lot about how they are grouped, and even e.g. basic algebraic\noperations such as subtraction and division are defined to be left-associative\nby widespread convention. Thus, it is required that Parsel support associating\nterms to the left. There are two ways to achieve this goal:\n\n1. Side-step the problem by simply not representing associativity in the AST.\n   This is done by using a helper type capable of expressing explicit repetition\n   of arbitrary length (e.g., [`Separated`](ast/struct.Separated.html)), instead\n   of creating binary AST nodes. The repeated AST nodes will be sub-expressions\n   at the next highest precedence level. This approach puts off the question of\n   associativity until evaluation/codegen, that is, until tree-walking time.\n2. Use the [`LeftAssoc`](ast/enum.LeftAssoc.html) helper type. This solves the\n   problem of infinite recursion by parsing iteratively (just like `Separated`).\n   It then transforms the resulting linear list of subexpressions into a properly\n   left-associative (left-leaning) tree of AST nodes.\n\n   Note that there is an analogous [`RightAssoc`](ast/enum.RightAssoc.html) type\n   as well. Strictly speaking, this is not *necessary,* because right recursion\n   makes progress and terminates just fine. However, deriving the parse tree in\n   an iterative manner has the advantage of recursing less, and including the\n   right-leaning counterpart is preferable for reasons of symmetry/consistency.\n\n### Span and Error Reporting\n\nTypes that implement `ToTokens` get an automatic `impl Spanned for T: ToTokens`.\nThis means that by default, all types deriving `ToTokens` will also report their\nspan correctly, and parse errors will have useful span information.\n\nHowever, there is an important caveat regarding alternations (`enum`s) in the\ngrammar. The way alternations can be parsed in a fully automatic and deceptively\nsimple way is by attempting to parse each alternative production, one after the\nother, and pick the first one that parses successfully. However, if none of them\nparses, then it is not obvious to the parser which of the errors it should report.\n\nThe heuristic we use to solve this problem is that we use `Span` information to\nselect the variant that got furthest in the token stream before having failed.\nThis works because most \"nice\" context-free grammars are constant lookahead, or\neven better, LL(1), i.e. single-token lookahead. This means that if a production\ninvolving more than one token fails in the middle, it will have advanced further\nin the stream than other productions, which failed right at the very first token.\n\nHowever, if span information is not available or not useful (i.e., when every\nproduction is spanned to the same `Span::call_site()` source location), then this\nheuristic breaks down, and it will select an arbitrary production, resulting in\nsubpar error messages. This means that you should try to preserve spans as much\nas possible. This in turn implies that using `syn::parse_str()` for parsing code\noutside procedural macros is preferable to using `syn::parse2()`, because the\nformer will result in a usefully-spanned AST, while the latter will not, at least\nnot when used on e.g. a `TokenStream` obtained via `quote!()` or `parse_quote!()`.\n\n### Roadmap, TODOs\n\n* [ ] Document all of the public API\n* [ ] Document all of the non-public API as well\n* [ ] Allow specifying custom error messages for each production/AST node type\n    * [ ] Allow conditions/sorting criteria/other customization for discovering\n          the best production/error message to report when parsing alternation\n* [x] `enum Either` AST helper type for basic binary alternation\n* [x] `Any` AST helper type for parsing until a given production succeeds. Unlike\n      `Many`, it doesn't require the productions to extend until end-of-input.\n* [x] Implement `AsRef`, `Deref`, and `Borrow` consistently for wrapper types\n      (e.g., `Paren`, `Bracket`, `Brace`)\n* [ ] Make the error reporting heuristic for alternation (based on the furthest\n      parsing production) work even when span information is not useful. **Nota\n      bene:** this absolutely **shouldn't** be done by just counting the number\n      of tokens/bytes in the remaining input, because that will result in an\n      **accidentally quadratic parsing performance!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2co3%2Fparsel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh2co3%2Fparsel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2co3%2Fparsel/lists"}