https://github.com/azarattum/carmustcompiler
C to ARM64 compiler written in Rust
https://github.com/azarattum/carmustcompiler
arm64 c-compiler compilers rust
Last synced: 11 months ago
JSON representation
C to ARM64 compiler written in Rust
- Host: GitHub
- URL: https://github.com/azarattum/carmustcompiler
- Owner: Azarattum
- Created: 2024-06-14T15:57:58.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-25T12:13:21.000Z (about 1 year ago)
- Last Synced: 2025-04-25T13:26:55.062Z (about 1 year ago)
- Topics: arm64, c-compiler, compilers, rust
- Language: Rust
- Homepage:
- Size: 73.2 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Carmust Compiler
Carmust is a C to ARM64 compiler written in Rust. This project is a prototype the primary goal of which was learning Rust. As it stands, the compiler currently supports a subset of C and lacks any real optimizations.
## Showcase
An example program that it can compile:
```c
typedef int point[2];
typedef int i32;
typedef float f32;
i32 globalData = 42;
f32 floatEncoding = 32.0;
point globalPoint = {4, 2};
int main() {
// Expressions
i32 unaryOperators = 43 + -globalData - 11;
i32 wildExpressions = 1 << !!!!!!-2 > (6 & 1 ^ (3 % 4)) & 255 == 1;
i32 booleanAndBinaryOps = 7 || (1 + !5 == 2 - 3) && 1;
f32 floatingPointMath = (2.5 * 2 + floatEncoding) / globalPoint[1];
// Arrays with initializers
int numbers[3] = {1, 2, 3};
point point = {1, 2};
short shortsAreAlsoAllowed = 1;
long longsAreSupportedAsWell = 123456;
// Assignments
unaryOperators = unaryOperators + wildExpressions + ' ';
point[1] = point[1] + numbers[2];
globalPoint[0] = 0;
// Empty statements
;
;
// For loops
for (int i = 0 + 0; i < 5 + 1; i = i + 1) {
unaryOperators = unaryOperators + 1 + globalPoint[0];
}
// Expressions in the return statement
return unaryOperators + floatingPointMath + booleanAndBinaryOps - point[1];
}
```
Let's look closely at the compilation process with a more simple program:
```c
float global = 42;
int main() {
int local = 1337;
return local % 255 + global;
}
```
At first the following tokens are extracted from the source code:
Tokens: [Keyword("float"), Identifier("global"), Symbol("="), Data(Integer(42), "42"), Symbol(";"), Keyword("int"), Identifier("main"), Symbol("("), Symbol(")"), Symbol("{"), Keyword("int"), Identifier("local"), Symbol("="), Data(Integer(1337), "1337"), Symbol(";"), Keyword("return"), Identifier("local"), Symbol("%"), Data(Integer(255), "255"), Symbol("+"), Identifier("global"), Symbol(";"), Symbol("}")]
Then an abstract syntax tree is generated:
AST: [Variable(Variable { datatype: Type(Compound(Float, 1)), name: "global", assignment: Some(Assignment { identifier: ("global", 0), value: Expression(Value(Data(Integer(42)))) }) }), Function(Function { datatype: Type(Compound(Int, 1)), name: "main", body: [Variable(Variable { datatype: Type(Compound(Int, 1)), name: "local", assignment: Some(Assignment { identifier: ("local", 0), value: Expression(Value(Data(Integer(1337)))) }) }), Return(Binary { op: Addition, lhs: Binary { op: Remainder, lhs: Value(Pointer("local", 0)), rhs: Value(Data(Integer(255))) }, rhs: Value(Pointer("global", 0)) })] })]
Which can be compiled into an intermediate representation:
IR:
globals:
global_0 = 42
main:
0) Mov @ 1337
1) Str 'local_1' @0
2) Ldr @ 'local_1'
3) Mov @ 255
4) Div @2 @3
5) Mul @3 @4
6) Sub @2 @5
7) Ldg @ 'global_0'
8) SCvtF @6
9) Add @8 @7
10) FCvtZS @9
11) Ret @10
And finally compiled to ARM64 assembly:
ASM:
.section __DATA,__data
global_0:
.word 1109917696
.section __TEXT,__text
.global main
main:
sub sp, sp, 16
mov w0, 1337
str w0, [sp, 12]
ldr w0, [sp, 12]
mov w1, 255
sdiv w2, w0, w1
mul w1, w1, w2
sub w0, w0, w1
adrp x3, global_0@GOTPAGE
ldr x3, [x3, global_0@GOTPAGEOFF]
ldr s1, [x3, 0]
scvtf s0, w0
fadd s0, s0, s1
fcvtzs w0, s0
add sp, sp, 16
ret
As you can see, it supports type inference, global/local variables, simple `for` loops, arbitrary expressions (with bitwise and boolean operators) and `return` statement which allows us to observe the result of the program:
Execution Result: 104