{"id":13419605,"url":"https://github.com/artagnon/rhine","last_synced_at":"2025-12-18T00:19:16.391Z","repository":{"id":29769897,"uuid":"33313806","full_name":"artagnon/rhine","owner":"artagnon","description":"🔬 a C++ compiler middle-end, using an LLVM backend","archived":true,"fork":false,"pushed_at":"2022-02-12T08:47:36.000Z","size":1215,"stargazers_count":164,"open_issues_count":0,"forks_count":8,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-07-31T22:50:42.165Z","etag":null,"topics":["c-plus-plus","compiler","compiler-design","llvm","programming-language"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/artagnon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-02T14:40:18.000Z","updated_at":"2024-04-16T13:15:31.000Z","dependencies_parsed_at":"2022-09-07T00:41:30.892Z","dependency_job_id":null,"html_url":"https://github.com/artagnon/rhine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/artagnon%2Frhine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/artagnon","download_url":"https://codeload.github.com/artagnon/rhine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221541960,"owners_count":16840123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","compiler","compiler-design","llvm","programming-language"],"created_at":"2024-07-30T22:01:18.310Z","updated_at":"2025-12-18T00:19:16.346Z","avatar_url":"https://github.com/artagnon.png","language":"C++","funding_links":[],"categories":["TODO scan for Android support in followings","C++"],"sub_categories":[],"readme":"# rhine: a C++ compiler middle-end for a typed ruby\n\n[![Build Status](https://travis-ci.org/artagnon/rhine.svg?branch=master)](https://travis-ci.org/artagnon/rhine)\n\nrhine is designed to be a fast language utilizing the LLVM JIT featuring N-d\ntensors, first-class functions, and type inference; specifying argument\ntypes is enough. It has a full blown AST into which it embeds a UseDef graph.\n\nrhine started off as [rhine-ml](https://github.com/artagnon/rhine-ml), and\nrhine-ml was called rhine earlier.\n\n- Effort put into rhine-ml: 2 months\n- Effort put into rhine: 1 year, 1 month\n\n## Language Features\n\n```elixir\ndef bar(arithFn Function(Int -\u003e Int -\u003e Int)) do\n  println $ arithFn 2 4\nend\ndef addCandidate(alpha Int, beta Int) do\n  ret $ alpha + beta\nend\ndef subCandidate(gamma Int, delta Int) do\n  ret $ gamma - delta\nend\ndef main() do\n  if false do\n    bar addCandidate\n  else\n    bar subCandidate\n  end\n  mu = {{2}, {3}}\n  println mu[1][0]\nend\n```\n\n`Int` is a type annotation, and only argument types need to be annotated,\nreturn type is inferred. `Function(Int -\u003e Int -\u003e Int)` is a function that takes\ntwo integers and returns one integer, mixing in some Haskell syntax. `$` is\nagain from Haskell, which is basically like putting the RHS in parens.\n\nrhine-ml, in contrast, has arrays, first-class functions, closures, variadic\narguments, macros. It's also much less buggy.\n\n## The recursive-descent parser\n\nrhine uses a handwritten recursive-descent parser, which is faster and reports\nbetter errors, than the former Bison one. You will need to use a one-token\nlookahead atleast, if you want to keep the code simple. This gives you one level\nof:\n\n```cpp\nparseSymbol(); // Oops, the lexed token indicates that we're not in the right\n               // function\n\nparseInstruction(); // Ask it to use an existing token, not lex a new one\n```\n\nAnother minor consideration is that newlines must be handled explicitly if you\nwant to substitute ; with a newline in the language.\n\n```cpp\nvoid Parser::getTok() {\n  LastTok = CurTok;\n  CurTok = Driver-\u003eLexx-\u003elex(\u0026CurSema, \u0026CurLoc);\n  LastTokWasNewlineTerminated = false;\n  while (CurTok == NEWLINE) {\n    LastTokWasNewlineTerminated = true;\n    CurTok = Driver-\u003eLexx-\u003elex(\u0026CurSema, \u0026CurLoc);\n  }\n}\n```\n\n## The AST\n\nThe AST is heavily inspired by LLVM IR, although it has some higher-level\nconcepts like `Tensor`. It's an SSA and has a UseDef graph embedded in it,\nmaking analysis and transformation easy.\n\nThe main classes are `Type` and `Value`. All types like `IntType`, `FloatType`\ninherit from `Type`, most of the others inherit from `Value`. A `BasicBlock` is\na `Value`, and so is `ConstantInt`.\n\nA `BasicBlock` is a vector of `Instruction`, and this is how the AST is an SSA:\nassignments are handled as a `StoreInst`; there is no real LHS, just RHS\nreferences.\n\n```cpp\nStoreInst::StoreInst(Value *MallocedValue, Value *NewValue);\n```\n\n## UseDef in AST\n\n`Value` is uniquified using LLVM's `FoldingSet`, and `Use` wraps it, so we can\nreplace one `Value` with another.\n\n```cpp\n/// A Use is basically a linked list of Value wrappers\nclass Use {\n  Value *Val;\n  Use *Prev;\n  Use *Next;\n   // Laid out in memory as [User] - [Use1] - [Use2]. Use2 has DistToUser 2\n  unsigned DistToUser;\n};\n```\n\nAn `Instruction` is a `User`. `User` and its `Use` values are laid out\nsequentially in memory, so it's possible to reach all the `Use` values from the\n`User`. It's also possible to reach the `User` from any `Use`, using\n`DistToUser`.\n\n```cpp\nclass User : public Value {\nprotected:\n  unsigned NumOperands;\n};\nclass Instruction : User;\n```\n\nThe `User` has a custom `new` to allocate memory for the `Use` instances\nas well.\n\n```cpp\n  void *User::operator new(size_t Size, unsigned Us) {\n    void *Storage = ::operator new (Us * sizeof(Use) + Size);\n    auto Start = static_cast\u003cUse *\u003e(Storage);\n    auto End = Start + Us;\n    for (unsigned Iter = 0; Iter \u003c Us; Iter++) {\n      new (Start + Iter) Use(Us - Iter);\n    }\n    auto Obj = reinterpret_cast\u003cUser *\u003e(End);\n    return Obj;\n  }\n};\n```\n\n## The Context\n\nThe Context is a somewhat large object that keeps the uniqified `Type` and\n`Value` instances. It also keeps track of `Externals`, the external C functions\nthat are provided as part of a \"standard library\". Unique `llvm::Builder` and\n`llvm::Context` objects, as well as the `DiagnosticPrinter` are exposed member\nvariables. Finally, it is necessary for symbol resolution, and keeps the\n`ResolutionMap`.\n\n## Symbol resolution\n\nsrc/Transform/Resolve is an example of something that utilizes the UseDef embedded\nin the AST.\n\n```elixir\n  B = A + 2\n```\n\ncreates one `UnresolvedValue`, `A`, an `AddInst`, and a `MallocInst`,\nwhich takes the string \"B\" and `AddInst` as operands.\n\nThe transform basically goes over all the `Instruction` in the `BasicBlock`,\nresolves `UnresolvedValue` instances, and sets the `Use` to the resolved value.\nIt hence replaces the `Value` underneath the `Use`, and since the `Instruction`\nis referencing `Use` instances, there are no dangling references.\n\n```cpp\nif (auto S = K-\u003eMap.get(V, Block)) {\n  /// %S = 2;\n  ///  ^\n  /// Came from here (MallocInst, Argument, or Prototype)\n  ///\n  /// Foo(%S);\n  ///      ^\n  ///  UnresolvedValue; replace with %Replacement\n  if (auto M = dyn_cast\u003cMallocInst\u003e(S)) {\n    if (dyn_cast\u003cStoreInst\u003e(U-\u003egetUser()))\n      U.set(M);\n  }\n}\n```\n\n## Type Inference\n\nType Inference is too simple. One `visit` function is overloaded for all\npossible `Value` classes.\n\n```cpp\nType *TypeInfer::visit(MallocInst *V) {\n  V-\u003esetType(visit(V-\u003egetVal()));\n  assert(!V-\u003eisUnTyped() \u0026\u0026 \"unable to type infer MallocInst\");\n  return VoidType::get(K);\n}\n```\n\n## Building\n\nThe desired directory structure is:\n```\nbin/ ; if you downloaded the tarball for this\n    cmake\n    ninja\n    flex\nsrc/\n    rhine/\n            README.md\n            llvm/ ; git submodule update --init to get the sources\n            llvm-build/\n                        bin/\n                            llvm-config ; you need to call this to build\n            rhine-build/\n                        rhine ; the executable\n```\n\nOn an OSX where you have everything:\n\n```sh\n$ brew install flex\n$ brew link --force flex\n$ git submodule update --init\n$ cd llvm-build\n# rhine is buggy; without debugging symbols, you can't report a useful bug\n$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm\n$ export PATH=`pwd`/bin:$PATH\n$ cd ../rhine-build\n$ cmake -GNinja ..\n# this will run the packages unittests, which should all pass\n$ ninja check\n```\n\nOn a Linux where you have nothing (and no root privileges are required):\n\nGet [git-lfs](https://git-lfs.github.com/), and fetch cmake-ninja-flex.tar.bz2\n\n```sh\n$ git lfs fetch\n```\n\nUntar it and set up environment variables.\n\n```sh\n$ tar xf cmake-ninja-flex.tar.bz2\n$ cd cmake-ninja-flex\n\n# for bash/zsh\n$ export TOOLS_ROOT=`pwd`\n$ export PATH=$TOOLS_ROOT:$PATH\n# for csh\n$ setenv TOOLS_ROOT `pwd`\n$ setenv PATH $TOOLS_ROOT:$PATH\n```\n\nThen,\n\n```sh\n$ git submodule update --init\n$ cd llvm-build\n# rhine is buggy; without debugging symbols, you can't report a useful bug\n$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm\n$ ninja\n$ export PATH=`pwd`/bin:$PATH\n$ cd ../rhine-build\n# flex isn't picked up from $PATH\n$ cmake -GNinja -DTOOLS_ROOT=$TOOLS_ROOT -DFLEX_EXECUTABLE=$TOOLS_ROOT/flex ..\n# if there are build (usually link) errors, please open an issue\n# tests are currently failing on Linux, need to look into it\n$ ninja check\n```\n\n## Commentary\n\nAn inefficient untyped language is easy to implement. `println` taking 23 and\n\"twenty three\" as arguments is a simple matter of switching on\ntype-when-unboxed. There's no need to rewrite the value in IR, and certainly no\nneed to come up with an overloading scheme.\n\n[Crystal](http://crystal-lang.org/) made a good decision to start with Ruby. If\nyour idea is to self-host, then the original language's efficiency does not\nmatter. All you need is good generated assembly (which LLVM makes easy).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartagnon%2Frhine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fartagnon%2Frhine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fartagnon%2Frhine/lists"}