{"id":13599909,"url":"https://github.com/goffrie/plex","last_synced_at":"2025-04-10T17:32:55.553Z","repository":{"id":46334029,"uuid":"37820356","full_name":"goffrie/plex","owner":"goffrie","description":"a parser and lexer generator as a Rust procedural macro","archived":false,"fork":false,"pushed_at":"2023-12-25T18:04:00.000Z","size":171,"stargazers_count":399,"open_issues_count":15,"forks_count":27,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-04-25T07:00:29.785Z","etag":null,"topics":["lexer-generator","parser-generator"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/goffrie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-06-21T18:22:45.000Z","updated_at":"2024-04-16T09:20:21.000Z","dependencies_parsed_at":"2022-07-22T10:48:24.178Z","dependency_job_id":"d10a0028-ba66-4000-b895-c106e21e442e","html_url":"https://github.com/goffrie/plex","commit_stats":{"total_commits":103,"total_committers":13,"mean_commits":7.923076923076923,"dds":0.2038834951456311,"last_synced_commit":"d6ce4100b91e02dd21b564ea2bb118fe8c9472fb"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goffrie%2Fplex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goffrie%2Fplex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goffrie%2Fplex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goffrie%2Fplex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/goffrie","download_url":"https://codeload.github.com/goffrie/plex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248261995,"owners_count":21074229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lexer-generator","parser-generator"],"created_at":"2024-08-01T17:01:17.335Z","updated_at":"2025-04-10T17:32:55.544Z","avatar_url":"https://github.com/goffrie.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"## plex, a parser and lexer generator\n\nThis crate provides a couple syntax extensions:\n\n- `lexer!`, which creates a DFA-based lexer that uses maximal munch.  It works\n  a bit like the `lex` tool.  You write regular expressions defining your\n  tokens, together with Rust expressions that create your tokens from slices of\n  input.\n- `parser!`, which creates an LALR(1) parser.  It works a bit like `yacc`.  You\n  write a context-free grammar, together with expressions for each rule.  You\n  give each nonterminal a Rust type, allowing you to build an AST recursively.\n  It also supports spans, giving you convenient source location reporting.\n\nYou can find a demo in `examples/demo.rs`.\n\n## Usage\n\nFirst, include the `plex` macros.\n\n```rust\nuse plex::{lexer, parser};\n```\n\n### Creating a lexer\n\nTo define a lexer, use the `lexer!` macro.\n\n```rust\nlexer! {\n    fn take_token(tok: 'a) -\u003e Token\u003c'a\u003e;\n```\n\nFirst declare the name of the function, the name of the token you will be able\nto access within the lexer, and the return type of your lexer. You can also\noptionally declare a lifetime for the strings you accept (here, `'a`).\n\nNote that this will declare a function with the actual signature\n`fn take_token\u003c'a\u003e(text: \u0026mut \u0026'a str) -\u003e Option\u003cToken\u003c'a\u003e\u003e`. The lexer will\nmodify the `text` slice to remove the consumed text. This is designed to make\nit easier to create an iterator of `Token`s out of a string slice.\n\n```rust\n    r\"[ \\t\\r\\n]\" =\u003e Token::Whitespace,\n    \"[0-9]+\" =\u003e Token::IntegerLiteral(tok.parse().unwrap()),\n    r#\"\"[^\"]*\"\"# =\u003e Token::StringLiteral(\u0026tok[1..tok.len()-1]),\n```\n\nThe rest of your lexer should consist of rules. The left hand side should be a\nliteral string (raw string literals are OK) corresponding to a regular\nexpression. You can use the typical regular expression syntax, including\nparentheses for grouping, square brackets for character classes, and the usual\n`.`, `|`, `*`, and `+`. (`?` is currently not supported.) You can also use\nsome extra operators, like `~` for negation and `\u0026` for conjunction:\n\n```rust\n    r\"/\\*~(.*\\*/.*)\\*/\" =\u003e Token::Comment(tok),\n```\n\nThe above regular expression will match a C-style comment with `/* */`\ndelimiters, but won't allow `*/` to appear inside the comment. (`.*\\*/.*`\nmatches any string containing `*/`, `~(.*\\*/.*)` matches any string that does\nnot.) This is important because the lexer uses maximal munch. If you had\nwritten simply `r\"/\\*.*\\*/\"`, then the lexer would consume the longest matching\nsubstring.  That would interpret `/* comment */ not comment? /* comment */` as\none large comment.\n\n```rust\n    \"let\" =\u003e Token::Let,\n    \"[a-zA-Z]+\" =\u003e Token::Ident(tok),\n    \".\" =\u003e panic!(\"unexpected character\"),\n}\n```\n\nNote that if multiple rules could apply, the one declared first wins. This lets\nyou declare keywords (which have precedence over identifiers) by putting them\nfirst.\n\n### Creating a parser\n\n`plex` uses the LALR(1) construction for parsers. This section, and `plex` in\ngeneral, will assume you understand LR parsing, along with its associated\nvocabulary.\n\nTo define a parser, use the `parser!` macro.\n\n```rust\nparser! {\n    fn parse(Token, Span);\n```\n\nThis declares the name of the parser (in this case, `parse`) and the input\ntypes that it takes. In this case, `parse` will take any iterator of pairs\n`(Token, Span)`. The token type must be an `enum` whose variants are in scope.\n(This is a current limitation of `plex` that might be fixed later.). Those\nvariants are the terminals of your grammar. `plex`-generated parsers also keep\ntrack of source locations (\"spans\") that are fed into it, so you'll need to\nmention your span type here. If you don't want to keep track of source\nlocations, you can use the unit type `()`.\n\nNext, tell `plex` how to combine two spans:\n\n```rust\n    (a, b) {\n        Span {\n            lo: a.lo,\n            hi: b.hi,\n        }\n    }\n```\n\nHere, `a` and `b` are `Span`s.  In this case we've defined `Span` as a\nstructure with two fields, `lo` and `hi`, representing the byte offsets of the\nbeginning and end of the span. Note that the extra braces are necessary here:\nthe body of the function has to be a block.\n\nNow you write your grammar. For each nonterminal, write its name, together with\nits type. This indicates the kind of data that the nonterminal parses into.\n\n```rust\n    statements: Vec\u003cExpr\u003e {\n```\n\nNote that the first nonterminal is special: it's the start symbol of your\ngrammar, and its type is the return type (more or less) of the parser.\n\nThen write the rules for this nonterminal. (The left-hand side of each rule is\nimplied to be `statements`.)\n\n```rust\n        statements[mut st] expr[e] Semi =\u003e {\n            st.push(e);\n            st\n        }\n```\n\nWrite the rule's right-hand side, an arrow `=\u003e`, and the code to handle this\nrule. The right-hand side is a sequence of nonterminals or terminals to match.\nHere, `statements` and `expr` are nonterminals. Square brackets assign a pattern\nto the result of a nonterminal, allowing us to use the data returned by that\nnonterminal. Terminals must be enum variants brought in scope. The expression\nmust evaluate to the type of the left-hand side: in this case, `Vec\u003cExpr\u003e`.\n\n```rust\n        =\u003e vec![],\n    }\n```\n\nEmpty rules are allowed: just don't write anything before the arrow.\n\nIf a terminal (i.e. a token) is a tuple-like enum variant, and so holds data,\nyou should destructure it using round brackets:\n\n```\n    expr: Expr {\n        Ident(s) =\u003e Expr::Var(span!(), s)\n    }\n}\n```\n\nInside a rule, the `span!()` macro evaluates to the span of the current\nright-hand-side. However, this only works if at least one token was matched. If\nthe rule matched an empty sequence, `span!()` will panic, so avoid using it in\nnullable rules.\n\nThe return type of this parser is\n`Result\u003cVec\u003cExpr\u003e, (Option\u003c(Token, Span)\u003e, \u0026'static str)\u003e`. The error type is a\npair consisting of the unexpected token, or `None` for EOF, and a message\ndescribing the tokens that were expected.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoffrie%2Fplex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoffrie%2Fplex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoffrie%2Fplex/lists"}