{"id":15666371,"url":"https://github.com/clarete/effigy","last_synced_at":"2025-07-19T06:37:37.892Z","repository":{"id":66285759,"uuid":"184189732","full_name":"clarete/effigy","owner":"clarete","description":"Small language that compiles to Python37 bytecode","archived":false,"fork":false,"pushed_at":"2020-07-15T05:46:50.000Z","size":278,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-13T06:52:40.899Z","etag":null,"topics":["bytecode","parser-generator","parsing","parsing-expression-grammar","peg","python"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clarete.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-04-30T04:08:36.000Z","updated_at":"2024-05-31T16:16:04.000Z","dependencies_parsed_at":"2023-05-01T06:00:40.061Z","dependency_job_id":null,"html_url":"https://github.com/clarete/effigy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/clarete/effigy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarete%2Feffigy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarete%2Feffigy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarete%2Feffigy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarete%2Feffigy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clarete","download_url":"https://codeload.github.com/clarete/effigy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarete%2Feffigy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265898337,"owners_count":23845776,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bytecode","parser-generator","parsing","parsing-expression-grammar","peg","python"],"created_at":"2024-10-03T14:00:38.673Z","updated_at":"2025-07-19T06:37:37.884Z","avatar_url":"https://github.com/clarete.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Table of Contents\n\n1.  [Effigy](#org298377f)\n    1.  [How to play with it](#org961e90d)\n        1.  [Currently Supported Types of Values](#org8daedde)\n        2.  [Language Features](#org9d8f1ee)\n        3.  [Very useful things missing](#org1f0a087)\n    2.  [How does it work](#org7809076)\n        1.  [Parser Generator for Parsing Expression Grammars (PEG)](#orgdf15c10)\n    3.  [Host Language](#orgfa2955a)\n    4.  [Resources](#org8a9d27f)\n        1.  [On Parsing \u0026 Parsing Expression Grammars](#org1a05dfe)\n        2.  [On the Python Compiler \u0026 Bytecode Format](#org7331218)\n\n\n\u003ca id=\"org298377f\"\u003e\u003c/a\u003e\n\n# Effigy\n\nThis is an experiment on building a small language compiler on top\nof a home brewed parsing expression grammar implementation.\n\nThe language implemented in this project, effigy, currently compiles\ndown to a subset of the Python 3.7 bytecode format. More\nspecifically, the Effigy compiler produces `.pyc` files.\n\nEffigy's runtime is the Python 3.7 Virtual Machine. The difference\nis just how the bytecode gets generated. Most idioms like declaring\nliterals, calling functions, assigning variables etc have the exact\nsame semantics as in regular Python code.\n\nEffigy differs from Python on the use of functions for control flow\na little more often and the absence of classes (might be added\nlater).\n\n\n\u003ca id=\"org961e90d\"\u003e\u003c/a\u003e\n\n## How to play with it\n\nEffigy is currently a teeny little JavaScript program. You can\ninstall it with `npm i efgc`. After that, you can type your effigy\nprograms in a file and then run `efgc yourfile.efg`. That will\ngenerate a `.pyc` file in the same directory as the source file\nthat can be ran with Python (currently only 3.7).\n\nHere's what's available and some of what's not:\n\n\n\u003ca id=\"org8daedde\"\u003e\u003c/a\u003e\n\n### Currently Supported Types of Values\n\n-   integers\n-   strings (double quotes only. Single quotes currently yield\n    syntax error)\n-   lists\n-   functions (named and anonymous)\n\n\n\u003ca id=\"org9d8f1ee\"\u003e\u003c/a\u003e\n\n### Language Features\n\n-   [X] Arithmetic Operators\n-   [X] Logic Operators\n-   [X] Comparison Operators\n-   [X] Flow Control (if/else/while/for)\n-   [X] Exceptions (single catch block for now)\n-   [ ] Imports\n\n\n\u003ca id=\"org1f0a087\"\u003e\u003c/a\u003e\n\n### Very useful things missing\n\n-   Slice notation\n-   Variadic arguments\n-   Named/Default parameters\n-   Floating points\n\n\n\u003ca id=\"org7809076\"\u003e\u003c/a\u003e\n\n## How does it work\n\nAs mentioned in the introduction, Effigy is an experiment. So it\nprobably won't be a good example of how to write the next industry\nstandard compiler, but it should give insights about what compilers\ndo and at least one way of doing it.\n\nThe current version of the `efgc` compiler is broken down into\nthree main pieces: 1) PEG parser-generator, 2) bytecode\ntranslator, 3) assembler. Let's look at them separately.\n\n\n\u003ca id=\"orgdf15c10\"\u003e\u003c/a\u003e\n\n### Parser Generator for Parsing Expression Grammars (PEG)\n\nThe PEG is the most basic component of this compiler. It's what\nthe compiler uses to 1) Parse the program text into a parse tree\nand 2) to transform the parse tree into `bytecode`.\n\nPEGs provide very similar functionality compared to Context Free\nGrammars. The most relevant difference is 1. being\ndeterministic 2. allowing infinite lookahead via predicates. This\nallows PEGs to provide functionality for both syntactical and\nsemantic matching. To read beyond this vague definition, I suggest\nreading the [article](https://bford.info/pub/lang/peg.pdf) that introduced the concept.\n\nThe API for parsing text currently looks like this:\n\n    \u003e const g = peg.pegc('Digit \u003c- [0-9]+');  // Compile Grammar\n    \u003e g.match('123')                          // Match some input\n    ['Digit', ['1', '2', '3']]\n\nThere's also an API for matching data structures (lists):\n\n    \u003e peg.pegc('List \u003c- { \"a\" { \"b\" } }').matchl([\"a\", [\"b\"]])\n    ['L', ['a', ['b']]]\n\nIn very practical terms, this home grown PEG implementation is\nbeing used in the [parser](./lang.peg) and the [translator](./lang.tr) pieces. And besides\nthe grammar language, this PEG also provides semantic actions\nexposed via the JavaScript API (not in the grammar\nlanguage). Allowing the user to declare traversals for the output\ntrees captured from successful matching. E.g.:\n\n    \u003e const join = x =\u003e Array.isArray(x) ? x.join('') : x; // Helper for joining lists of strings together\n    \u003e const g = peg.pegc('Digit \u003c- [0-9]+') // Compile Grammar\n    \u003e const r = g.bind({ Digit: ({ visit }) =\u003e parseInt(join(visit()), 10) }); // Bind semantic actions\n    \u003e r('123')\n    123\n\nIt is worth mentioning that `bindl()` is also available for\nbinding semantic actions to a generator that will process data\nstructures (lists) instead of text.\n\nThe semantic actions [are modular](https://ohmlang.github.io/pubs/dls2016/modular-semantic-actions.pdf). They're not executed until the\nwhole match is finished successfully. That way, the user of the\nPEG engine doesn't ever have to think about the backtracking that\nhappens behind the scenes.\n\nThis PEG implementation has no dependencies besides the host\nlanguage used to write the file `peg.js`.\n\nSadly there are a few valuable things that I didn't get to\nimplement yet that would considerably increase the quality of the\nPEG implementation:\n\n-   Error Reporting. Although parser generators sometimes get bad\n    fame for their error reporting, there is some modern literature\n    on how to allow pretty good error reporting. The best this PEG\n    does is to report accurately the farther failure position\n    heuristics that tell how far on the input the current grammar\n    was able to match before the error happened. [Link for the\n    aforementioned modern literature](https://arxiv.org/pdf/1405.6646.pdf). Current error reporting on\n    list matching is awful to say the least. It literally only tells\n    you that it didn't match a list.\n\n-   Arity of PEG operators. The operator `OneOrMore (+)` returns an\n    item if it matches one and a list if it matches many. And the\n    list is flattened. The `ZeroOrMore (*)` operator behaves\n    similarly to `(+)` but can also return nothing. Which is\n    represented with `null`. These are a bit confusing but I'm not\n    really sure if I found all the answers to design something\n    better yet.\n\n-   Left recursion. There's a branch for supporting that. It\n    currently misses mutual left recursion support so it's not\n    merged yet. The [implementation leverages bounded left recursion](https://arxiv.org/pdf/1207.0443).\n\n\n\u003ca id=\"orgfa2955a\"\u003e\u003c/a\u003e\n\n## Host Language\n\nAlthough the first target of the little compiler is a subset of\nPython, JavaScript was chosen as the host language for a few\nreasons:\n\n1.  I didn't want to do it in Python because it'd be very tempting\n    to use one of its modules for parsing, scope analysis or code\n    generation. I wanted to implement all the pieces of the compiler\n    to be able to reason how far I could leverage the PEG to do\n    those tasks.\n\n2.  Python and JavaScript have very similar semantics for closures\n    but present slight differences in how side-effect (assignment)\n    of values declared in enclosed scopes work. Java Script\n    separates assignment from declaration, Python provides the\n    `nonlocal` keyword.\n    \n    I wanted something right in the middle for Effigy: Assignment is\n    coupled to declaring a variable, but provides the keyword `let`\n    to mark names to be saved as closures so assignments in deeper\n    scopes will know its not a new value.\n\n3.  It doesn't really matter. The goal is to rewrite Effigy with\n    Effigy.\n\n\n\u003ca id=\"org8a9d27f\"\u003e\u003c/a\u003e\n\n## Resources\n\n\n\u003ca id=\"org1a05dfe\"\u003e\u003c/a\u003e\n\n### On Parsing \u0026 Parsing Expression Grammars\n\n-   [Parsing Expression Grammars: A Recognition-Based Syntactic Foundation](https://bford.info/pub/lang/peg.pdf)\n-   [Parsing Expression Grammars for Structured Data](http://www.lua.inf.puc-rio.br/publications/mascarenhas11parsing.pdf)\n-   [PEG-based transformer provides front-, middle and back-end stages in a simple compiler](http://www.vpri.org/pdf/tr2010003_PEG.pdf)\n-   [Modular Semantic Actions](https://ohmlang.github.io/pubs/dls2016/modular-semantic-actions.pdf)\n\n\n\u003ca id=\"org7331218\"\u003e\u003c/a\u003e\n\n### On the Python Compiler \u0026 Bytecode Format\n\n-   \u003chttps://devguide.python.org/compiler\u003e\n-   \u003chttps://github.com/python/cpython/tree/master/Python\u003e\n-   \u003chttps://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarete%2Feffigy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclarete%2Feffigy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarete%2Feffigy/lists"}