{"id":14128282,"url":"https://github.com/artagnon/rhine-ml","last_synced_at":"2025-09-26T22:31:38.438Z","repository":{"id":18827710,"uuid":"22042992","full_name":"artagnon/rhine-ml","owner":"artagnon","description":"🏞 an OCaml compiler for an untyped lisp","archived":true,"fork":false,"pushed_at":"2015-03-31T23:34:00.000Z","size":2147,"stargazers_count":631,"open_issues_count":2,"forks_count":24,"subscribers_count":53,"default_branch":"master","last_synced_at":"2025-01-17T11:36:48.721Z","etag":null,"topics":["compiler","llvm","ocaml","programming-language"],"latest_commit_sha":null,"homepage":"","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/artagnon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-20T20:45:18.000Z","updated_at":"2025-01-02T11:49:53.000Z","dependencies_parsed_at":"2022-08-05T02:01:55.403Z","dependency_job_id":null,"html_url":"https://github.com/artagnon/rhine-ml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/artagnon/rhine-ml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/artagnon","download_url":"https://codeload.github.com/artagnon/rhine-ml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine-ml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277155486,"owners_count":25770556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-26T02:00:09.010Z","response_time":78,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","llvm","ocaml","programming-language"],"created_at":"2024-08-15T16:01:26.530Z","updated_at":"2025-09-26T22:31:38.135Z","avatar_url":"https://github.com/artagnon.png","language":"OCaml","funding_links":[],"categories":["OCaml","Compilers and Compiler Tools"],"sub_categories":[],"readme":"# Rhine\n\nRhine is a Clojure-inspired Lisp on LLVM JIT featuring variable-length\nuntyped arrays, first-class functions, closures, and macros. While\nClojure hides the lower-level details by running atop the JVM, Rhine\naims to expose how common Lisp constructs map to hardware.\n\n## Building\n\nFirst, `opam switch 4.02.1` to make sure that you're running a\ncustom-built ocaml (for camlp4). First, run `brew install libffi`.\nThen, run `opam install ocamlfind menhir core textutils ctypes`, \nopen a new shell to refresh env, and invoke `make`.\n\n## Troubleshooting the build\n\nThere are a number of reasons for the build failing:\n\n1. Silly things like `git submdoule --init` failing can be fixed\n   easily; just anchor the submodule to a valid commit and send a PR.\n\n2. opam problems. If you run into one of these after following the\n   instructions presented above, open an issue. An upstream update\n   probably screwed something up.\n\n3. Silly build issues usually arise from the build not being perfectly\n   parallel: due to races, the `-j8` picks the dependee too\n   earlier. Either go into `llvm-build/` and keep hitting `make -j8`\n   until it succeeds, or drop the `-j8` together, waiting for a longer\n   time for a predictable result.\n\n4. Running into more serious build issues usually means that llvm\n   upstream has changed in a trivial way; you can attempt to fix this\n   yourself, or open an issue.\n\n## How it works\n\nAn untyped system means that all values are boxed/unboxed from a\n`value_t` structure at runtime:\n\n```llvm\n%value_t = type {\n\t i32,                                ; type of data\n\t i64,                                ; integer\n\t i1,                                 ; bool\n\t i8*,                                ; string\n\t %value_t**,                         ; array/fenv\n\t i64,                                ; array/string length\n\t double,                             ; double\n\t %value_t* (i32, %value_t**, ...)*,  ; function\n\t i8                                  ; char\n}\n```\n\nThe overhead of boxing/unboxing is paid by all dynamic languages,\nalthough multiple optimizations (including speculative optimization)\ncan reduce the overhead. Rhine currently only implements the basic\noptimizations bundled with LLVM.\n\nRhine does automatic type conversions, so `(+ 3 4.2)` will do the\nright thing. To implement this, IR is generated to inspect the types\nof the operands (zeroth member of `value_t`), and a br (branch) is\nused to take the right codepath. A possible optimization is to\ngenerate a branchless codepath for all-integer arguments.\n\nLLVM provides the [array\ntype](http://llvm.org/docs/LangRef.html#array-type) and [vector\ntype](http://llvm.org/docs/LangRef.html#vector-type). They cannot be\nused since they are fixed-length; i.e. the length must be known at\ncompile-time. The problem is that a construct like `(cons 8 coll)`\ngenerates a runtime length which is equal to the `(length coll)` + 1.\nSo, we malloc, getelementptr, and store by hand. It has the type\nspecified by the fourth member of `value_t`.\n\nTo implement first-class functions, note that all functions must have\nthe same type; i.e. the type of the function pointer (the seventh\nmember of `value_t`). How else would you implement:\n\n```clojure\n(defn map\n  [f coll]\n  (if coll\n    (cons (f (first coll))\n          (map f (rest coll)))\n    []))\n\n(defn map2\n  [f c1 c2]\n  (if (and c1 c2)\n    (cons (f (first c1) (first c2))\n          (map2 f (rest c1) (rest c2)))\n    []))\n```\n\nHere, `f` takes one argument in `map`, but takes two arguments in\n`map2`. A function pointer type embracing variable arguments is\nimplemented using the [varargs\nframework](http://llvm.org/docs/LangRef.html#variable-argument-handling-intrinsics)\nof LLVM. Note that `va_arg` doesn't work on x86, so Rhine extracts the\nvalues by hand. The first argument gives the number of arguments, and\nis used to implement varargs in the Rhine language. The second\nargument is the closure environment (which has the same type as an\narray).\n\nClosures are simple to implement with this framework in place. First,\nwhen a function is declared, parse out all the unbound variable names\n(not present in arguments or `let`), sort the names, put it in a\nhashtable for later reference, and codegen the code required to bind\nthe names from the `env` argument in order. At the callsite, look up\nthis hashtable, and pack all the corresponding environment variables\ninto the `env` argument. So, stuff like this will work:\n\n```clojure\n(defn quux [] (let [a aenv] (println a) (println env)))\n(let [env 12 aenv 17] (quux))\n```\n\nBut there's a problem because we have first-class functions. What\nhappens to this?\n\n```clojure\n(defn t [y] (+ x y))\n(defn f [x] t)\n(let [g (f 3)] (println (g 4)))\n```\n\nHere, the callsite for `t` is not in `f`, but in the anonymous\nfunction at the end. But the anonymous function doesn't have the\nenvironment variable `x` that `t` requires, `f` does. So, we have to\naugment function pointers with the environment (resuing the fourth\nargument of `value_t`).\n\nIt's important to realize that macros require that we go\nback-and-forth between LLVM values and the OCaml codegen engine. How\nelse would you evaluate something like:\n\n```clojure\n(defmacro baz [x]\n  `[1 2 ~x])\n(baz (+ 2 2))\n```\n\nNote that macro arguments must be passed unevaluated at the\ncallsite. The maco-expand stage now needs to codegen `[1 2\n\u003csomething\u003e]`, where that `\u003csomething\u003e` itself needs to be codegen'ed\nby evaluating the argument: the result from the compiler must be\nreturned as an AST object to OCaml. This requires some involved\nconstruction of OCaml objects from C.\n\nAnother subtle point to note is that macros must be lifted out of the\nprogram and macro-expanded at the beginning of the program. This is\nbecause we can't suddenly codegen segments required for macroexpansion\nin the middle of codegen'ing another function.\n\n## Todo\n\n- Lambdas. Requires lifting them out and codegen'ing them first.\n\n- Garbage collection. LLVM provides several garbage collection\n  intrinsincs, but the main challenge is to make sure that the C\n  bindings don't leak memory.\n\n- Custom optimizations.\n\n- Copy-on-write for persistent data structures, and persistent\n  variables with `setq`. How do you access a variable's history?\n\n- FFI to C. It's simply a question of defining a good interface,\n  because we already use malloc and memcpy internally in LLVM IR.\n\n- Self-hosting compiler. Necessary to co-develop the language and\n  compiler.\n\n- Optional typesystem. Use the :- syntax to provide type safety and\n  optimizations!\n\n- Concurrency primitives. See Clojure's core.async.\n\n- Support for programs spanning multiple files.\n\n- Polished error reporting with file:line annotations.\n\n## Notes\n\n- LLVM codegen statements all have side-effects. In most places, order\n  in which variables are codegen'ed are unimportant, and lets can be\n  moved around; but in conditionals, statements must be codegen'ed\n  exactly in order: this means let statements can't be permuted,\n  leading to imperative-style OCaml code.\n\n- Since LLVM IR is strongly typed, it is possible to inspect the\n  _types_ of llvales from OCaml at generation-time. However, there is\n  no way for the codegen-stage to inspect the _values_ of variables\n  while the program is running. This has the consequence that a loop\n  hinged upon an llvalue must be implemented in LLVM IR; hence, for\n  functions that require iterating on variable-length arrays, we end\n  up writing tedious LLVM IR generation code instead of an equivalent\n  OCaml code.\n\n- Debugging errors from LLVM bindings is hard. Using ocamldebug does\n  not work, since the callstack leading up to an LLVM call in C++ is\n  not owned by OCaml. The alternative is to use lldb, but matching up\n  the C++ call with a line in the OCaml code is non-trivial.\n\n- Implementing complex functions as builtins (by directly generating\n  IR) is perhaps more efficient than implementing them in Rhine, but\n  the gap is very small due to optimization passes applied on the\n  Rhine-generated IR. The marginal benefit outweighs the cost of\n  correctly weaving complex IR.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartagnon%2Frhine-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fartagnon%2Frhine-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartagnon%2Frhine-ml/lists"}