{"id":18799515,"url":"https://github.com/vidarh/writing-a-compiler-in-ruby","last_synced_at":"2025-04-09T11:12:27.307Z","repository":{"id":66681435,"uuid":"119308","full_name":"vidarh/writing-a-compiler-in-ruby","owner":"vidarh","description":"Code from my series on writing a Ruby compiler in Ruby","archived":false,"fork":false,"pushed_at":"2023-05-14T17:52:18.000Z","size":1134,"stargazers_count":278,"open_issues_count":7,"forks_count":23,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-02T03:55:24.297Z","etag":null,"topics":["compilers","parsers","ruby","ruby-compiler"],"latest_commit_sha":null,"homepage":"http://www.hokstad.com/compiler","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vidarh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2009-02-01T16:54:37.000Z","updated_at":"2025-03-02T06:27:19.000Z","dependencies_parsed_at":"2024-12-25T11:15:19.936Z","dependency_job_id":"b3d44aef-282b-45c1-a5c9-7bccb7535dd0","html_url":"https://github.com/vidarh/writing-a-compiler-in-ruby","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vidarh%2Fwriting-a-compiler-in-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vidarh%2Fwriting-a-compiler-in-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vidarh%2Fwriting-a-compiler-in-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vidarh%2Fwriting-a-compiler-in-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vidarh","download_url":"https://codeload.github.com/vidarh/writing-a-compiler-in-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248027411,"owners_count":21035594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compilers","parsers","ruby","ruby-compiler"],"created_at":"2024-11-07T22:15:37.688Z","updated_at":"2025-04-09T11:12:27.271Z","avatar_url":"https://github.com/vidarh.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Writing a (Ruby) compiler in Ruby\n\nSource for my series on writing a compiler in Ruby.\n\nSee \u003chttp://www.hokstad.com/compiler\u003e\n\n**NOTE** This is still wildly incomplete.\n\n## Status as of May 14th 2023\n\n * The compiler self-hosts, but is slow and GC is disabled.\n * Self-hosting was achieved by working around a number of bugs that\n   still needs to be fixed.\n * Type tagging has been put in place for integers, which drastically\n   reduces amount of object creation\n * Starting to work through *some* of the updates needed to approach\n   Ruby 3.2 compatibility. That is a *very long* road.\n * Starting looking at what will be needed to work with Ruby spec.\n   Challenge is figuring out how to make it work w/ahead of time\n   compilation without e.g. support for eval (which I *might* eventually\n   add, but that's far away.\n   \n### Older highlights\n\n * Garbage collector is integrated; garbage collection article nearly done.\n * A number of compiler bugs have been worked around to get this far.\n   Most sites in the compiler affected are marked with `@bug`\n * Current garbage collection overhead is a problem.\n   A few easy wins, and avenues to investigate:\n     * Pre-create objects for all constants (numeric and string in particular)\n     * Currently Proc and env objects are created separately; might be worth\n       allocating the env as part of the Proc object, but not sure it's worthwhile\n     * Capture stats on number of allocated objects per class, and output,\n * When compiling the compiler with itself it successfully parses all of itself\n and produces identical output to when run under MRI. This does not mean the\n parse is complete (it absolutely is not), or bug free - it means the parser\n acts correctly on the very specific subset of expressions currently present\n in the compiler itself.\n\nAssuming I get time to continue current progress, the compiler might fully compile\nitself and the compiled version might be able to compile itself this autumn.\n\n(to make that clear, what I want to get to is:\n\n 1. Run the compiler source with MRI on its own source to produce a \"compiler1\" that is a native i386 binary\n 2. Run \"compiler1\" with its own source as input to produce a \"compiler2\"\n 3. Run \"compiler2\" with its own source as input to produce a \"compiler3\"\n\nCurrently step 1 \"works\" to the extent that it produces a binary, but that binary has bugs, and so\nfails to produce a compiler2. To complete the bootstrap process I need it to complete the compile\nand produce a binary, but I *also* need that binary to be correct. I can part-validate that by comparing\nit to \"compiler1\" - they should have identical assembler source, but the best way of validating it\nfully is to effectively repeat step 2, but with \"compiler2\" as the input, and verify that \"compiler2\"\nand \"compiler3\" are identical, to validate the entire end-to-end process. This may seem paranoid,\nbut once step2 works the point is step3 *should* be trivial, so there's no point in not taking\nthat extra step.\n\n\n### Before getting too excited about trying to use the compiler at the point when it bootstraps fully, note:\n\n * The compiler itself carefully avoids known missing functionality, and/or I work around some during testing the bootstrap. The big ones:\n   * Exceptions (used by the compiler, but only begin/rescue causes problems and that's only used once; commented out for testing)\n   * Regexp (not used by the compiler)\n   * Float (not used by the compiler)\n * The compiler code is littered with workarounds for specific bugs (they're not consistently marked, but `FIXME` will include all of the workarounds for compiler bugs and more, and whenever I find new ones they're also marked `@bug`).\n * The GC mentioned above is very simple and not well suited for the sheer amount\n of objects currently allocated. It needs a number of improvements to handle\n many small objects, and the compiler needs additional work to reduce the number of objects created.\n\nOnce the compiler is bootstrapped w/workarounds, my next steps are:\n\n * Add support for exceptions (prob. worth a blog post)\n * Go through the current FIXME's and explicitly check which are still relevant (some have likely been fixed as a result of other bug fixes); add test cases, and fix them in turn.\n * Make [mspec](https://github.com/ruby/mspec) compile\n * Make [the Ruby Spec Suite](https://github.com/ruby/spec) run, and cry over how large parts of it will fail.\n * Some of the GC improvements mentioned above.\n\n\n## Caveats\n\nThis section covers caveats about compiled Ruby vs. MRI, not\ngenerally missing pieces or bugs in the current state of the\ncompiler (of which there are many).\n\n### require\n\nPresently, \"require\" is evaluated statically at compile time.\n\nThis makes certain Ruby patterns hard or impossible to support.\nE.g. reading the contents of a directory and caling \"require\"\nfor each .rb file found will not presently work, and may never\nwork, as it is not clear in the context of compilation whether\nor not the intent is to load this file at compile time or runtime.\n\nRuby allows the argument to \"require\" to be dynamically generated.\nE.g. \"require File.dirname(__FILE__) + '/blah'\". To facilitate\ncompatibility, limited forms of this pattern may eventually\nbe supported.\n\nOn MRI, \"require\" is generally overridden by a custom version\nfor rubygems or bundler. This is not likely to ever be\nsupported. \"require\" is likely to be treated as a keyword,\nrather than as an overrideable method.\n\n\n### $0\n\nWhile `$0` will at some point be initialized with the name of\nthe file compilation is triggered for, certain patterns of\nRuby, such as conditionally executing code based on whether\na given file is executed directly are conceptually different,\ngiven that $0 gets bound at *compile time*.\n\nWe'll need to consider if the right behaviour is for `$0` and/or\n`__FILE__` to contain the equivalent of C's `argv[0]` instead.\nPossibly make `$0` and `__FILE__` refer to different things.\n\n\n### $:, $LOAD_PATH\n\nThe load path is malleable in MRI, and this is very frequently\nused alongside certain methods to modify which files may be\nloaded. Currently this is not supported.\n\nIt is likely that for compatibility a limited subset of Ruby\nwill be *interpreted* at compile time to support some forms\nof this pattern. See also \"require\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvidarh%2Fwriting-a-compiler-in-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvidarh%2Fwriting-a-compiler-in-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvidarh%2Fwriting-a-compiler-in-ruby/lists"}