{"id":17796381,"url":"https://github.com/d-plaindoux/celma","last_synced_at":"2025-03-17T02:31:19.659Z","repository":{"id":47675587,"uuid":"175075837","full_name":"d-plaindoux/celma","owner":"d-plaindoux","description":" Library for generalised parser combinators and a dedicated meta-language in Rust","archived":false,"fork":false,"pushed_at":"2025-03-11T04:39:39.000Z","size":11231,"stargazers_count":9,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-11T05:27:10.407Z","etag":null,"topics":["compiler","generic","meta-language","parser-combinators","pipeline","procedural-macro","rust-lang"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/d-plaindoux.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-11T20:07:13.000Z","updated_at":"2025-03-11T04:39:43.000Z","dependencies_parsed_at":"2023-11-07T00:04:27.903Z","dependency_job_id":"cffa8bab-8263-4d4a-b5d6-3de5090ba042","html_url":"https://github.com/d-plaindoux/celma","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-plaindoux%2Fcelma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-plaindoux%2Fcelma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-plaindoux%2Fcelma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d-plaindoux%2Fcelma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/d-plaindoux","download_url":"https://codeload.github.com/d-plaindoux/celma/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243837004,"owners_count":20355813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","generic","meta-language","parser-combinators","pipeline","procedural-macro","rust-lang"],"created_at":"2024-10-27T11:45:15.479Z","updated_at":"2025-03-17T02:31:19.653Z","avatar_url":"https://github.com/d-plaindoux.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Celma\n\n[![stable](http://badges.github.io/stability-badges/dist/stable.svg)](http://github.com/badges/stability-badges)\n\n[Celma (\"k\")noun \"channel\" (KEL) in Quenya](https://www.elfdict.com/w/kelma)\n\nCelma is a generalised parser combinator implementation. Generalised means not an implementation restricted to a stream\nof characters.\n\n## Overview\n\nGeneralization is the capability to design a parser based on pipelined parsers and separate parsers regarding their\nsemantic level.\n\n# Celma parser meta language\n\n## Grammar\n\nIn order to have a seamless parser definition two dedicated `proc_macro` are designed:\n\n```rust\nparsec_rules = \"pub\" ? \"let\" ident ('{' rust_type '}') ? (':' '{' rust_type '}') ? \"=\" parser) +\nparser       = binding? atom occurrence? additional? transform?\n```\n\n```rust\nbinding      = ident '='\noccurrence   = (\"*\" | \"+\" | \"?\")\nadditional   = \"|\" ? parser\ntransform    = \"-\u003e\" '{' rust_code '}'\natom         = alter? '(' parser? ')' | CHAR | STRING | ident\nalter        = (\"^\" | \"!\" | \"#\" | \"/\")\nident        = [a..zA..Z][a..zA..Z0..9_] * - {\"let\"}\n```\n\nThe `alter` is an annotation where:\n\n- `^` allows the capability to recognize negation,\n- `!` allows the capability to backtrack on failure and\n- `#` allows the capability to capture all chars.\n\nThe `#` alteration is important because it prevents massive list construction in memory.\n\n## Using the meta-language\n\nTherefore, a parser can be defined using this meta-language.\n\n```rust\nlet parser = parsec!( \n    ('{' v=^'}'* '}') -\u003e { v.into_iter().collect::\u003cString\u003e() }\n);\n```\n\n## A Full Example: JSON\n\nA [JSon parser](https://github.com/d-plaindoux/celma/blob/master/macro/benches/json.rs#L61) can be designed thanks to\nthe Celma parser meta language.\n\n### JSon abstract data type\n\n```rust\n#[derive(Clone)]\npub enum JSON {\n    Number(f64),\n    String(String),\n    Null,\n    Bool(bool),\n    Array(Vec\u003cJSON\u003e),\n    Object(Vec\u003c(String, JSON)\u003e),\n}\n```\n\n### Transformation functions\n\n```rust\nfn mk_vec\u003cE\u003e(a: Option\u003c(E, Vec\u003cE\u003e)\u003e) -\u003e Vec\u003cE\u003e {\n    if a.is_none() {\n        Vec::new()\n    } else {\n        let (a, v) = a.unwrap();\n        let mut r = v;\n        r.insert(0, a);\n        r\n    }\n}\n\nfn mk_string(a: Vec\u003cchar\u003e) -\u003e String {\n    a.into_iter().collect::\u003cString\u003e()\n}\n\nfn mk_f64(a: Vec\u003cchar\u003e) -\u003e f64 {\n    mk_string(a).parse().unwrap()\n}\n```\n\n### The JSon parser\n\nThe JSon parser is define by six rules dedicated to `number`, `string`, `null`, `boolean`, `array`\nand `object`.\n\n#### JSON Rules\n\n```rust\nparsec_rules!(\n    let json:{JSON}          = S _=(string | null | boolean  | array | object | number) S\n    let number:{JSON}        = f=NUMBER                                -\u003e {JSON::Number(f)}\n    let string:{JSON}        = s=STRING                                -\u003e {JSON::String(s)}\n    let null:{JSON}          = \"null\"                                  -\u003e {JSON::Null}\n    let boolean:{JSON}       = b=(\"true\"|\"false\")                      -\u003e {JSON::Bool(b==\"true\")}\n    let array:{JSON}         = ('[' S a=(_=json _=(',' _=json)*)? ']') -\u003e {JSON::Array(mk_vec(a))}\n    let object:{JSON}        = ('{' S a=(_=attr _=(',' _=attr)*)? '}') -\u003e {JSON::Object(mk_vec(a))}\n    let attr:{(String,JSON)} = (S s=STRING S \":\" j=json)\n);\n```\n\n#### Basic rules and terminals\n\n```rust\nparsec_rules!(\n    let STRING:{String} = delimited_string\n    let NUMBER:{f64}    = c=#(INT ('.' NAT)? (('E'|'e') INT)?)    -\u003e {mk_f64(c)}\n    let INT             = ('-'|'+')? NAT                          -\u003e {}\n    let NAT             = digit+                                  -\u003e {}\n    let S               = space*                                  -\u003e {}\n);\n```\n\n## The expression parser thanks to pipelined parsers.\n\nThe previous parser mixes char analysis and high-level term construction. This can be done in a different manner since\nCelma is a generalized parser combinator implementation.\n\nFor instance a first parser dedicated to lexeme recognition can be designed. Then on top of this lexer an expression\nparser can be easily designed.\n\n### Tokenizer\n\nA tokenizer consumes a stream of char and produces tokens.\n\n```rust\nparsec_rules!(\n    let token:{Token}   = S _=(int|keyword) S\n    let int:{Token}     = c=!(#(('-'|'+')? digit+)) -\u003e {Token::Int(mk_i64(c))}\n    let keyword:{Token} = s=('+'|'*'|'('|')')       -\u003e {Token::Keyword(s)}\n    let S               = space*                    -\u003e {}\n);\n```\n\n### Lexemes\n\nThe Lexeme parser recognizes simple token keywords.\n\n```rust\nparsec_rules!(\n    let PLUS{Token}   = {kwd('+')} -\u003e {}\n    let MULT{Token}   = {kwd('*')} -\u003e {}\n    let LPAREN{Token} = {kwd('(')} -\u003e {}\n    let RPAREN{Token} = {kwd(')')} -\u003e {}\n);\n```\n\n### Expression parser\n\nThe expression parser builds expression consuming tokens. For this purpose the stream type can be specified for each\nparser. If it's not the case the default one is `char`.\nIn the following example the declaration `expr{Token}:{Expr}` denotes a parser consuming a `Token` stream and producing\nan `Expr`.\n\n```rust\nparsec_rules!(\n    let expr{Token}:{Expr}     = (s=sexpr e=(_=oper _=expr)?) -\u003e {mk_operation(s,e)}\n    let oper{Token}:{Operator} = (PLUS                        -\u003e {Operator::Plus})\n                               | (MULT                        -\u003e {Operator::Mult})\n    let sexpr{Token}:{Expr}    = (LPAREN _=expr RPAREN)\n                               | number\n    let number{Token}:{Expr}   = i=kint                       -\u003e {Expr::Number(i)}\n);\n```\n\n### Expression parser in  action\n\n```rust\nlet tokenizer = token();\nlet stream = ParserStream::new( \u0026 tokenizer, CharStream::new(\"1 + 2\"));\nlet response = expr().and_left(eos()).parse(stream);\n\nmatch response {\nSuccess(v, _, _) =\u003e assert_eq!(v.eval(), 3),\n_ =\u003e assert_eq!(true, false),\n}\n```\n\n# Celma language internal design\n\nCelma is an embedded language in Rust for building simple parsers.\nThe language is processed when Rust is compiled. To this end, we\nidentify two steps. The first is to analyse the language using a\nsyntax analyser in a direct style. Then, this parser is invoked\nduring the compilation phase, using a procedural macro dedicated\nto Rust to manage the language in Rust.\n\n## V0\n\nIn V0, transpilation is a direct style generation of Parsec without any\noptimisations. To this end, the `AST` is translated directly into a parser \nusing the `core` library.\ncf. [celma parser in direct style](https://github.com/d-plaindoux/celma/blob/master/lang/v0/parser/src/parser.rs).\n\n### Benchmarks\n\n- Material: MacBookPro Apple M2 Max 64G\n- [Samples](https://github.com/d-plaindoux/celma/blob/master/lang/v0/macro/benches/data/) used for the benchmarks.\n\n#### HTTP Header\n\n```shell\ntest http_data        ... bench:      11,845 ns/iter (+/- 382)       = 65 MB/s\n```\n\n#### JSON\n\n```shell\ntest json_apache      ... bench:   1,560,412 ns/iter (+/- 21,383)    = 87 MB/s\ntest json_canada_nom  ... bench:     127,925 ns/iter (+/- 15,263)    = 82 MB/s\ntest json_canada_pest ... bench:  57,397,799 ns/iter (+/- 3,442,455) = 43 MB/s\ntest json_data        ... bench:     126,348 ns/iter (+/- 5,283)     = 81 MB/s\n```\n\n## V1\n\nThis version targets an aggressive and an efficient parser compilation. For this\npurpose the compilation follows a traditional control and data flow inspired by \nthe following papers:\n- [A Typed, Algebraic Approach to Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/a-typed-algebraic-approach-to-parsing.pdf)    nd\n- [Fusing Lexing and Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/fusing-lexing-and-parsing.pdf).\n\n### Celma AST generation \n\nFirst, we express [Celma in Celma](https://github.com/d-plaindoux/celma/blob/master/lang/v1/parser/src/parser.rs).\nThis gives us an AST denoting parsers expressed using the Celma language i.e. Celma(v1) thanks to Celma(v0).\n\n### Normalisation\n\nThe first step is to produce the **Deterministic Greibach Normal Form** \nof a given grammar. For this purpose we have a first AST for the grammar\nabstract denotation.\n\nNOTE: Work in progress\n\n### Fusion\n\nNOTE: Work in progress\n\n### Staging\n\nNOTE: Work in progress\n\n# License\n\nCopyright 2019-2025 Didier Plaindoux.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd-plaindoux%2Fcelma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd-plaindoux%2Fcelma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd-plaindoux%2Fcelma/lists"}