{"id":18014811,"url":"https://github.com/zevv/xpeg","last_synced_at":"2025-03-26T18:30:53.575Z","repository":{"id":57557705,"uuid":"439465326","full_name":"zevv/xpeg","owner":"zevv","description":"Experimental PEG library for Elixir","archived":false,"fork":false,"pushed_at":"2024-05-23T13:33:16.000Z","size":158,"stargazers_count":30,"open_issues_count":3,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-11T14:49:23.287Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zevv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-12-17T21:32:32.000Z","updated_at":"2024-07-31T07:51:26.000Z","dependencies_parsed_at":"2023-11-11T08:45:12.839Z","dependency_job_id":null,"html_url":"https://github.com/zevv/xpeg","commit_stats":{"total_commits":133,"total_committers":1,"mean_commits":133.0,"dds":0.0,"last_synced_commit":"5a83293d51f46a616aa1df694b9b020a5e43026e"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fxpeg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fxpeg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fxpeg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zevv%2Fxpeg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zevv","download_url":"https://codeload.github.com/zevv/xpeg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222162101,"owners_count":16941512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-30T04:11:11.031Z","updated_at":"2024-10-30T04:11:11.700Z","avatar_url":"https://github.com/zevv.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n![Stability: experimental](https://img.shields.io/badge/stability-beta-yellow.svg)\n\n![Xpeg](xpeg.png)\n\n\nMore documentation will be added, for now please refer to the documentation of\n[Xpeg](https://github.com/zevv/npeg), a Nim implementation of a similar PEG\nparser.\n\n\n## Introduction\n\nXpeg is a pure Elixir pattern matching library. It provides macros to compile\npatterns and grammars (PEGs) to Elixir function which will parse a string and\ncollect selected parts of the input. PEGs are not unlike regular expressions,\nbut offer more power and flexibility, and have less ambiguities. (More about\nPEGs on [Wikipedia](https://en.wikipedia.org/wiki/Parsing_expression_grammar))\n\n```elixir\n                                      ╭───────────»──────────╮\nObject o──'{'─»─fn()─»─┬─[Obj_pair]─»─┴─┬─\",\"─»─[Obj_pair]─┬─┴─┬─»─\"}\"──o\n                       │                ╰─────────«────────╯   │\n                       ╰─[S]───────────────────────────────────╯\n```\n\nSome use cases where Xpeg is useful are configuration or data file parsers,\nrobust protocol implementations, input validation, lexing of programming\nlanguages or domain specific languages.\n\nSome Xpeg highlights:\n\n- Grammar definitions and Elixir code acting on or transforming the parsed\n  fragments can be freely mixed.\n\n- Xpeg-generated parsers can be used both at run and at compile time.\n\n- Xpeg offers various methods for tracing, optimizing and debugging your\n  parsers.\n\n- Xpeg can draw cool diagrams.\n\n\n## Installation\n\n```elixir\ndef deps do\n  [\n    {:xpeg, \"~\u003e 0.9.1\"}\n  ]\nend\n```\n\n## Quickstart\n\nHere is a simple example showing the power of Xpeg: The macro `peg` compiles a\ngrammar definition into a `parser` function, which is used to match a string and\nplace the key-value pairs into a list of tuples:\n\n```elixir\np = Xpeg.peg Dict do\n  Dict \u003c- Pair * star(\",\" * Pair) * !1\n  Pair \u003c- Word * \"=\" * Number * fn [a,b|cs] -\u003e [{b,a}|cs] end\n  Word \u003c- str(+{'a'..'z'})\n  Number \u003c- int(+{'0'..'9'})\nend\n\nXpeg.match(p, \"grass=4,horse=1,star=2\")\n```\n\nOutput:\n\n```elixir\n[{\"star\", 2}, {\"horse\", 1}, {\"grass\", 4}]\n```\n\n## Usage\n\nThe basic operation consists of the provided _grammar_, which consists of a set\nof named _rules_. A name is an elixir atom, in the form `:name` or `Name`,\nwhichever you prefer.  A rule is made up of a number of _atoms_ (not to be\nconfused with Elixirs atoms. I should probably find another name for this) and\n_operators_, which are executed to match the input string.  Rules can also call\ninto other rules, allowing for recursive grammars.\n\nFor example, the grammar below matches a comma-separated list of words\n\n```elixir\np = peg List do\n  List \u003c- Word * star( \",\" * Word )\n  Word \u003c- +{'a'..'z'}\nend\n```\n\n- The `List` rule matches one `Word`, followed by zero or more (`star(P)`)\n  times a `,` followed by a `Word`\n- The `Word` rule matches one-or-more (`+P`) times the set of characters (`{}`)\n  consisting of all letters from `'a'` to `'z'`\n\n\nDuring the execution of the grammar, matching parts of the subject strings can\nbe _captured_ with the `str()` operator. All captures are stored on the\n`captures` list inside the parser state. This list is returned by the `match()`\nfunction, but can also be used by in-grammar functions to perform conversions\nor transformations.\n\nBelow is the same grammar as above, but in this case it captures all\nthe individual `Word`s:\n\n```elixir\np = peg List do\n  List \u003c- Word * star( \",\" * Word )\n  Word \u003c- str(+{'a'..'z'})\nend\n\nmatch(p, \"one,two,three\")\n```\n\nThe above will return these following list of captures:\n```elixir\n[\"three\", \"two\", \"one\"]\n```\n\nA powerful feature allows mixing of Elixir functions with the grammar, which\ncan be used to perform transformations of the captures or build abstract syntax\ntrees (ASTs) on-the-fly.\n\nFor example, the grammar above is changed to match numbers instead of words,\nand a conversion function is called after every matching number that\nconverts the last captured value on the `captures` list to an integer:\n\n```elixir\np = peg List do\n  List \u003c- Word * star( \",\" * Word )\n  Word \u003c- str(+{'0'..'9'}) *\n    fn [v|cs] -\u003e\n      [String.to_integer(v)|cs]\n    end\nend\n\nmatch(p, \"123,42,31415\")\n```\n\nwhich results in the following captures:\n\n```elixir\n[31415, 42, 123]\n```\n\n\nMore elaborate examples can be found in [examples_test.exs](/test/examples_test.exs),\nincluding a parser for arithmatic expressions and a full JSON parser.\n\n\n## Grammars\n\nThe `peg` macro provides a method to define (recursive) grammars. The first\nargument is the name of initial patterns, followed by a list of named patterns.\nPatterns can now refer to other patterns by name, allowing for recursion.\n\nThe order in which the grammar patterns are defined affects the generated\nparser. Although Xpeg could always reorder, this is a design choice to give the\nuser more control over the generated parser:\n\n- when a pattern P1 refers to pattern P2 which is defined before P1, P2 will\n  be inlined in P1. This increases the generated code size, but generally\n  improves performance.\n\n- when a pattern P1 refers to pattern P2 which is defined after P1, P2 will be\n  generated as a subroutine which gets called from P1. This will reduce code\n  size, but might also result in a slower parser.\n\n\n## Syntax\n\nThe Xpeg syntax is similar to normal PEG notation, but some changes were made\nto allow the grammar to be properly parsed by the Elixir compiler:\n\n- Xpeg uses prefix operators instead of suffix operators for `+`, `-`\n- Elixir does not support the `*` and `?` prefix operators, so instead\n  `star(P)` and `opt(P)` are used\n- The explicit `*` infix operator is used for concatenation\n\nXpeg patterns and grammars can be composed of the following parts:\n\n```\nAtoms:\n\n      0              # matches always and consumes nothing\n      1              # matches any character\n      n              # matches exactly n characters\n     'x'             # matches literal character 'x'\n     \"xyz\"           # matches literal string \"xyz\"\n     {'x'..'y'}      # matches any character in the range from 'x'..'y'\n     {'x','y','z'}   # matches any character from the set\n\nOperators:\n\n      P1 * P2        # concatenation\n      P1 | P2        # ordered choice\n      P1 - P2        # matches P1 if P2 does not match\n     (P)             # grouping\n     !P              # matches everything but P\n     \u0026P              # matches P without consuming input\n  opt(P)             # matches P zero or one times\n star(P)             # matches P zero or more times\n     +P              # matches P one or more times\n      P[n]           # matches P n times\n      P[m..n]        # matches P m to n times\n     @P              # searches for P\n\nCaptures:\n\n  str(P)             # Adds the matched string to the capture list\n  int(P)             # Adds the matched integer to the capture list\nfloat(P)             # Adds the matched float to the capture list\n\nElixir function:\n\n    fn(captures)    # Elixir function for transformations\n\n```\n\n## Performance\n\nGenerated parsers will typically never reach the spead of a hand-crafted and\nfine tuned parser for a specific grammar.  Having said that, Xpeg parsers can\nstill be pretty fast; for example, the JSON parser from the examples runs at\napproximately 2/3 of the speed of the Poison JSON parser, which is said to be\n\"wicked-fast\"\n\n\n## Tracing and debugging\n\n\n### Syntax diagrams\n\nWhen passing the option `:dump_graph` to `Xpeg.peg()`, Xpeg will dump syntax\ndiagrams (also known as railroad diagrams) for all parsed rules.\n\nSyntax diagrams are sometimes helpful to understand or debug a grammar, or to\nget more insight in a grammars' complexity.\n\n```elixir\n                                      ╭───────────»──────────╮\nObject o──'{'─»─fn()─»─┬─[Obj_pair]─»─┴─┬─\",\"─»─[Obj_pair]─┬─┴─┬─»─\"}\"──o\n                       │                ╰─────────«────────╯   │\n                       ╰─[S]───────────────────────────────────╯\n```\n\n- Optionals (?) are indicated by a forward arrow overhead.\n- Repeats ('+') are indicated by a backwards arrow underneath.\n- Non-terminals are printed in square brackets.\n\n\n### Tracing\n\n\nWhen passing the flag `:dump_ir` to Xpeg.peg, it will print the IR representation of the\nparsed grammar at compile time. The option `:trace` will print the IR instructions and the matched subject\nstring during parsing - this will dramatically slow down the parsing, however.\n\nFor example, the following program:\n\n```elixir\nXpeg.peg Line, trace: true, dump_ir: true do\n Space \u003c- ' '\n Line \u003c- Word * star(Space * Word)\n Word \u003c- +{'a'..'z'}\nend\n```\n\nwill output the following intermediate representation at compile time. From the\nIR it can be seen that the space rule has been inlined in the line rule, but\nthat the `Word` rule has been emitted as a subroutine which gets called from\n`Line`:\n\n```elixir\nLine:\n  0 :call 6\n  1 :choice 5 1\n  2 :chr 32\n  3 :call 6\n  4 :commit\n  5 :return\nWord:\n  6 :set 'abcdefghijklmnopqrstuvwxyz'\n  7 :span 'abcdefghijklmnopqrstuvwxyz'\n  8 :return\n  fail :fail\n```\n\nAt runtime, the following trace is generated. The trace consists of a number of columns:\n\n- The current instruction pointer, which maps to the compile time dump.\n- The substring of the subject.\n- The instruction being executed.\n\n```elixir\n    0 | 'one two'              | {:call, 6}\n    6 | 'one two'              | {:set, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | 'ne two'               | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | 'e two'                | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | ' two'                 | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    8 | ' two'                 | {:return}\n    1 | ' two'                 | {:choice, 5, 1}\n    2 | ' two'                 | {:chr, 32}\n    3 | 'two'                  | {:call, 6}\n    6 | 'two'                  | {:set, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | 'wo'                   | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | 'o'                    | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    7 | []                     | {:span, 'abcdefghijklmnopqrstuvwxyz'}\n    8 | []                     | {:return}\n    4 | []                     | {:commit}\n    1 | []                     | {:choice, 5, 1}\n    2 | []                     | {:chr, 32}\n fail | []                     | {:fail}\n    5 | []                     | {:return}\n```\n\nThe exact meaning of the IR instructions is not discussed here\n\n\n## TODO\n\n- I do not like the `star()` and `opt()` syntax of the AST, but given the limited\n  support for prefix operators in Elixir I'm not yet sure how to make this better\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzevv%2Fxpeg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzevv%2Fxpeg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzevv%2Fxpeg/lists"}