{"id":19703466,"url":"https://github.com/vnmakarov/yaep","last_synced_at":"2026-03-04T02:01:28.011Z","repository":{"id":1884145,"uuid":"43922630","full_name":"vnmakarov/yaep","owner":"vnmakarov","description":"Yet Another Earley Parser","archived":false,"fork":false,"pushed_at":"2022-03-11T20:47:32.000Z","size":3103,"stargazers_count":141,"open_issues_count":22,"forks_count":16,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-11-22T02:03:48.652Z","etag":null,"topics":["ambiguous-grammars","ast","earley-parser","error-recovery","grammar","library","minimal-cost-ast"],"latest_commit_sha":null,"homepage":"","language":"SWIG","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vnmakarov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-08T23:57:52.000Z","updated_at":"2025-08-17T15:55:39.000Z","dependencies_parsed_at":"2022-08-06T11:15:34.253Z","dependency_job_id":null,"html_url":"https://github.com/vnmakarov/yaep","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vnmakarov/yaep","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vnmakarov%2Fyaep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vnmakarov%2Fyaep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vnmakarov%2Fyaep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vnmakarov%2Fyaep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vnmakarov","download_url":"https://codeload.github.com/vnmakarov/yaep/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vnmakarov%2Fyaep/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30069220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T01:03:42.280Z","status":"online","status_checked_at":"2026-03-04T02:00:07.464Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ambiguous-grammars","ast","earley-parser","error-recovery","grammar","library","minimal-cost-ast"],"created_at":"2024-11-11T21:17:58.319Z","updated_at":"2026-03-04T02:01:27.988Z","avatar_url":"https://github.com/vnmakarov.png","language":"SWIG","funding_links":[],"categories":[],"sub_categories":[],"readme":"# YAEP -- standalone Earley parser library\n  * **YAEP** is an abbreviation of Yet Another Earley Parser.\n  * This standalone library is created for convenience.\n  * The parser development is actually done as a part of the [*Dino* language\n    project](https://github.com/dino-lang/dino).\n  * YAEP is licensed under the MIT license.\n\n# YAEP features:\n  * It is sufficiently fast and does not require much memory.\n    This is the **fastest** implementation of the Earley parser which I\n    know of. If you know a faster one, please send me a message. It can parse\n    **300K lines of C program per second** on modern computers\n    and allocates about **5MB memory for 10K line C program**.\n  * YAEP does simple syntax directed translation, producing an **abstract\n    syntax tree** as its output.\n  * It can parse input described by an **ambiguous** grammar.  In\n    this case the parse result can be a single abstract tree or all\n    possible abstract trees. YAEP produces a compact\n    representation of all possible parse trees by using DAG instead\n    of real trees.\n  * YAEP can parse input described by an ambiguous grammar\n    according to **abstract node costs**.  In this case the parse\n    result can be a **minimal cost** abstract tree or all possible\n    minimal cost abstract trees.  This feature can be used to code\n    selection task in compilers.\n  * It can perform **syntax error recovery**.  Moreover its error\n    recovery algorithm finds error recovery with a **minimal** number of\n    ignored tokens.  This permits implementing parsers with very good\n    error recovery and reporting.\n  * It has **fast startup**.  There is only a tiny and insignificant delay\n    between processing grammar and the start of parsing.\n  * A grammar for YAEP can be constructed through function calls or using\n    a YACC-like description syntax.\n \n# Usage example:\n* The following is a small example of how to use YAEP to parse expressions.\n  We have omitted the functions `read_token`, `syntax_error_func`,\n  and `parse_alloc_func` which are needed to provide tokens, print syntax\n  error messages, and allocate memory for the parser.\n\n```\nstatic const char *description =\n\"\\n\"\n\"TERM NUMBER;\\n\"\n\"E : T         # 0\\n\"\n\"  | E '+' T   # plus (0 2)\\n\"\n\"  ;\\n\"\n\"T : F         # 0\\n\"\n\"  | T '*' F   # mult (0 2)\\n\"\n\"  ;\\n\"\n\"F : NUMBER    # 0\\n\"\n\"  | '(' E ')' # 1\\n\"\n\"  ;\\n\"\n  ;\n\nstatic void parse (void)\n{\n  struct grammar *g;\n  struct earley_tree_node *root;\n  int ambiguous_p;\n\n  if ((g = earley_create_grammar ()) == NULL) {\n      fprintf (stderr, \"earley_create_grammar: No memory\\n\");\n      exit (1);\n  }\n  if (earley_parse_grammar (g, TRUE, description) != 0) {\n      fprintf (stderr, \"%s\\n\", earley_error_message (g));\n      exit (1);\n    }\n  if (earley_parse (g, read_token_func, syntax_error_func, parse_alloc_func,\n                    NULL, \u0026root, \u0026ambiguous_p))\n    fprintf (stderr, \"earley_parse: %s\\n\", earley_error_message (g));\n  earley_free_grammar (g);\n}\n```\n  * To add error recovery, just add a reserved symbol ``error`` to\n    the rules. Skipped terminals during error recovery will be\n    represented in the resulting abstract tree by a node called ``error``.\n    For example, if you want to include expression- and statement-level\n    error-recovery in a programming language grammar, the rules could look\n    like the following:\n```\n  stmt : IF '(' expr ')' stmt ELSE stmt # if (2 4 6)\n       | ...\n       | error # 0\n       ;\n  expr : IDENT # 0\n       | ...\n       | error # 0\n       ;\n``` \n  * For more details, please see the documentation in directory ``src/``,\n    or the YAEP examples in files ``test*.c`` in directories ``test/C`` or ``test/C++``.\n\n# Installing:\n  * ``mkdir build``\n  * ``cd build``\n  * ``\u003csrcdir\u003e/configure --srcdir=\u003csrcdir\u003e --prefix=\u003cprefix for install dirs\u003e``\n    or ``cmake -DCMAKE_BUILD_TYPE=Release`` (make sure you have CMake installed)\n  * ``make``\n  * ``make test`` (optional) \n  * ``make install``\n\n# Speed comparison of YACC, MARPA, YAEP, and GCC parsers:\n\n* Tested parsers:\n  * YACC 1.9 from Linux Fedora Core 21.\n  * MARPA C Library, version 8.3.0. A popular Earley parser implementation\n    using the Practical Earley Parser algorithm and Leo Joop's approach.\n  * The C parser in GCC-4.9.2.\n  * YAEP as of Oct. 2015.\n* Grammar:\n  * The base test grammar is the **ANSI C** grammar which is mostly\n    a left recursion grammar.\n  * For MARPA and YAEP, the grammar is slightly ambiguous as typenames\n    are represented with the same kind of token as identifiers.\n  * For the YACC description, typename is a separate token type distinct from\n    other identifiers.  The YACC description does not contain any actions except\n    for a small number needed to give feedback to the scanner on how to treat\n    the next identifier (as a typename or regular identifier).\n* Scanning test files for YACC, MARPA, and YAEP:\n  * We prepare all tokens beforehand in order to exclude scanning time from our benchmark.\n  * For YACC, at the scanning stage we do not yet distinguish identifiers and typenames. \n* Tests:\n  * The first test is based on the file ``gen.c`` from parser-generator MSTA.  The file\n    was concatenated 10 times and the resulting file size was 67K C lines.\n  * The second test is a pre-release version of gcc-4.0 for i686 with all the source\n    code combined into one file\n    ([source](http://people.csail.mit.edu/smcc/projects/single-file-programs/)).\n    The file size was 635K C lines.\n  * The C pre-processor was applied to the files.\n  * Additional preparations were made for YACC, MARPA, and YAEP:\n    * GCC extensions (mostly attributes and asm) were removed from the\n      pre-processed files.  The removed code is a tiny and insignificant\n      fraction of the entire code.\n    * A very small number of identifiers were renamed to avoid confusing the simple\n      YACC actions to distinguish typenames and identifiers.  So the resulting code\n      is not correct as C code but it is correct from the syntactic point of view.\n* Measurements:\n  * The result times are elapsed (wall) times.\n  * Memory requirements are measured by comparing the output of Linux ``sbrk`` before and\n    after parsing.\n  * For GCC, memory was instead measured as max resident memory reported by ``/usr/bin/time``.\n* How to reproduce: please use the shell script ``compare-parsers.tst``\n  from directory ``src``.\n\n\n* Results:\n  * First file (**67K** lines).  Test machine is i7-2600 (4 x 3.4GHz)\n    with 8GB memory under FC21.\n\n\n|                      |Parse time only  |Overall    |Memory (parse only) MB|\n|----------------------|----------------:|----------:|---------------------:|\n|YACC                  |   0.07          | 0.17      |   20                 |\n|MARPA                 |   3.48          | 3.77      |  516                 |\n|YAEP                  |   0.18          | 0.28      |   26                 |\n\n  * Second file (**635K** lines).  Test machine is 2xE5-2697 (2 x 14 x 2.6GHz)\n    with 128GB memory under FC21.\n\n|                      |Parse time only  |Overall    |Memory (parse only) MB|\n|----------------------|----------------:|----------:|---------------------:|\n|YACC                  |  0.25           | 0.55      |  120                 |\n|gcc -fsyntax-only     |      -          | 1.22      |  194                 |\n|gcc -O0               |      -          |19.37      |  761                 |\n|MARPA                 | 22.23           |23.41      |30310                 |\n|YAEP                  |  1.43           | 1.68      |  142                 |\n\n* Conclusions:\n  * YAEP without a scanner is up to **20** times faster than Marpa and requires\n    up to **200** times less memory.\n  * Still, it is **2.5** - **6** times slower (**1.6** - **3** times when\n     taking the scanner into account) than YACC.\n\n# Future directions\n  * Implement YACC-style description syntax for operator precedence and associativity.\n  * Implement bindings for popular scripting languages.\n  * Introduce abstract node codes (instead of string labels) for faster work with abstract trees.\n  * Permit nested abstract nodes in simple translation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvnmakarov%2Fyaep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvnmakarov%2Fyaep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvnmakarov%2Fyaep/lists"}