Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/artagnon/rhine
🔬 a C++ compiler middle-end, using an LLVM backend
https://github.com/artagnon/rhine
c-plus-plus compiler compiler-design llvm programming-language
Last synced: 3 months ago
JSON representation
🔬 a C++ compiler middle-end, using an LLVM backend
- Host: GitHub
- URL: https://github.com/artagnon/rhine
- Owner: artagnon
- License: mit
- Archived: true
- Created: 2015-04-02T14:40:18.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2022-02-12T08:47:36.000Z (almost 3 years ago)
- Last Synced: 2024-07-31T22:50:42.165Z (6 months ago)
- Topics: c-plus-plus, compiler, compiler-design, llvm, programming-language
- Language: C++
- Homepage:
- Size: 1.16 MB
- Stars: 164
- Watchers: 7
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rhine: a C++ compiler middle-end for a typed ruby
[![Build Status](https://travis-ci.org/artagnon/rhine.svg?branch=master)](https://travis-ci.org/artagnon/rhine)
rhine is designed to be a fast language utilizing the LLVM JIT featuring N-d
tensors, first-class functions, and type inference; specifying argument
types is enough. It has a full blown AST into which it embeds a UseDef graph.rhine started off as [rhine-ml](https://github.com/artagnon/rhine-ml), and
rhine-ml was called rhine earlier.- Effort put into rhine-ml: 2 months
- Effort put into rhine: 1 year, 1 month## Language Features
```elixir
def bar(arithFn Function(Int -> Int -> Int)) do
println $ arithFn 2 4
end
def addCandidate(alpha Int, beta Int) do
ret $ alpha + beta
end
def subCandidate(gamma Int, delta Int) do
ret $ gamma - delta
end
def main() do
if false do
bar addCandidate
else
bar subCandidate
end
mu = {{2}, {3}}
println mu[1][0]
end
````Int` is a type annotation, and only argument types need to be annotated,
return type is inferred. `Function(Int -> Int -> Int)` is a function that takes
two integers and returns one integer, mixing in some Haskell syntax. `$` is
again from Haskell, which is basically like putting the RHS in parens.rhine-ml, in contrast, has arrays, first-class functions, closures, variadic
arguments, macros. It's also much less buggy.## The recursive-descent parser
rhine uses a handwritten recursive-descent parser, which is faster and reports
better errors, than the former Bison one. You will need to use a one-token
lookahead atleast, if you want to keep the code simple. This gives you one level
of:```cpp
parseSymbol(); // Oops, the lexed token indicates that we're not in the right
// functionparseInstruction(); // Ask it to use an existing token, not lex a new one
```Another minor consideration is that newlines must be handled explicitly if you
want to substitute ; with a newline in the language.```cpp
void Parser::getTok() {
LastTok = CurTok;
CurTok = Driver->Lexx->lex(&CurSema, &CurLoc);
LastTokWasNewlineTerminated = false;
while (CurTok == NEWLINE) {
LastTokWasNewlineTerminated = true;
CurTok = Driver->Lexx->lex(&CurSema, &CurLoc);
}
}
```## The AST
The AST is heavily inspired by LLVM IR, although it has some higher-level
concepts like `Tensor`. It's an SSA and has a UseDef graph embedded in it,
making analysis and transformation easy.The main classes are `Type` and `Value`. All types like `IntType`, `FloatType`
inherit from `Type`, most of the others inherit from `Value`. A `BasicBlock` is
a `Value`, and so is `ConstantInt`.A `BasicBlock` is a vector of `Instruction`, and this is how the AST is an SSA:
assignments are handled as a `StoreInst`; there is no real LHS, just RHS
references.```cpp
StoreInst::StoreInst(Value *MallocedValue, Value *NewValue);
```## UseDef in AST
`Value` is uniquified using LLVM's `FoldingSet`, and `Use` wraps it, so we can
replace one `Value` with another.```cpp
/// A Use is basically a linked list of Value wrappers
class Use {
Value *Val;
Use *Prev;
Use *Next;
// Laid out in memory as [User] - [Use1] - [Use2]. Use2 has DistToUser 2
unsigned DistToUser;
};
```An `Instruction` is a `User`. `User` and its `Use` values are laid out
sequentially in memory, so it's possible to reach all the `Use` values from the
`User`. It's also possible to reach the `User` from any `Use`, using
`DistToUser`.```cpp
class User : public Value {
protected:
unsigned NumOperands;
};
class Instruction : User;
```The `User` has a custom `new` to allocate memory for the `Use` instances
as well.```cpp
void *User::operator new(size_t Size, unsigned Us) {
void *Storage = ::operator new (Us * sizeof(Use) + Size);
auto Start = static_cast(Storage);
auto End = Start + Us;
for (unsigned Iter = 0; Iter < Us; Iter++) {
new (Start + Iter) Use(Us - Iter);
}
auto Obj = reinterpret_cast(End);
return Obj;
}
};
```## The Context
The Context is a somewhat large object that keeps the uniqified `Type` and
`Value` instances. It also keeps track of `Externals`, the external C functions
that are provided as part of a "standard library". Unique `llvm::Builder` and
`llvm::Context` objects, as well as the `DiagnosticPrinter` are exposed member
variables. Finally, it is necessary for symbol resolution, and keeps the
`ResolutionMap`.## Symbol resolution
src/Transform/Resolve is an example of something that utilizes the UseDef embedded
in the AST.```elixir
B = A + 2
```creates one `UnresolvedValue`, `A`, an `AddInst`, and a `MallocInst`,
which takes the string "B" and `AddInst` as operands.The transform basically goes over all the `Instruction` in the `BasicBlock`,
resolves `UnresolvedValue` instances, and sets the `Use` to the resolved value.
It hence replaces the `Value` underneath the `Use`, and since the `Instruction`
is referencing `Use` instances, there are no dangling references.```cpp
if (auto S = K->Map.get(V, Block)) {
/// %S = 2;
/// ^
/// Came from here (MallocInst, Argument, or Prototype)
///
/// Foo(%S);
/// ^
/// UnresolvedValue; replace with %Replacement
if (auto M = dyn_cast(S)) {
if (dyn_cast(U->getUser()))
U.set(M);
}
}
```## Type Inference
Type Inference is too simple. One `visit` function is overloaded for all
possible `Value` classes.```cpp
Type *TypeInfer::visit(MallocInst *V) {
V->setType(visit(V->getVal()));
assert(!V->isUnTyped() && "unable to type infer MallocInst");
return VoidType::get(K);
}
```## Building
The desired directory structure is:
```
bin/ ; if you downloaded the tarball for this
cmake
ninja
flex
src/
rhine/
README.md
llvm/ ; git submodule update --init to get the sources
llvm-build/
bin/
llvm-config ; you need to call this to build
rhine-build/
rhine ; the executable
```On an OSX where you have everything:
```sh
$ brew install flex
$ brew link --force flex
$ git submodule update --init
$ cd llvm-build
# rhine is buggy; without debugging symbols, you can't report a useful bug
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm
$ export PATH=`pwd`/bin:$PATH
$ cd ../rhine-build
$ cmake -GNinja ..
# this will run the packages unittests, which should all pass
$ ninja check
```On a Linux where you have nothing (and no root privileges are required):
Get [git-lfs](https://git-lfs.github.com/), and fetch cmake-ninja-flex.tar.bz2
```sh
$ git lfs fetch
```Untar it and set up environment variables.
```sh
$ tar xf cmake-ninja-flex.tar.bz2
$ cd cmake-ninja-flex# for bash/zsh
$ export TOOLS_ROOT=`pwd`
$ export PATH=$TOOLS_ROOT:$PATH
# for csh
$ setenv TOOLS_ROOT `pwd`
$ setenv PATH $TOOLS_ROOT:$PATH
```Then,
```sh
$ git submodule update --init
$ cd llvm-build
# rhine is buggy; without debugging symbols, you can't report a useful bug
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm
$ ninja
$ export PATH=`pwd`/bin:$PATH
$ cd ../rhine-build
# flex isn't picked up from $PATH
$ cmake -GNinja -DTOOLS_ROOT=$TOOLS_ROOT -DFLEX_EXECUTABLE=$TOOLS_ROOT/flex ..
# if there are build (usually link) errors, please open an issue
# tests are currently failing on Linux, need to look into it
$ ninja check
```## Commentary
An inefficient untyped language is easy to implement. `println` taking 23 and
"twenty three" as arguments is a simple matter of switching on
type-when-unboxed. There's no need to rewrite the value in IR, and certainly no
need to come up with an overloading scheme.[Crystal](http://crystal-lang.org/) made a good decision to start with Ruby. If
your idea is to self-host, then the original language's efficiency does not
matter. All you need is good generated assembly (which LLVM makes easy).