Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sysprog21/shecc
A self-hosting and educational C optimizing compiler
https://github.com/sysprog21/shecc
arm armv7 c compiler compiler-optimization cross-compiler elf linux qemu risc-v riscv rv32i rv32im self-hosting ssa-form
Last synced: about 1 month ago
JSON representation
A self-hosting and educational C optimizing compiler
- Host: GitHub
- URL: https://github.com/sysprog21/shecc
- Owner: sysprog21
- License: bsd-2-clause
- Created: 2020-08-30T20:06:30.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-09-25T06:10:58.000Z (about 2 months ago)
- Last Synced: 2024-09-26T23:41:49.941Z (about 1 month ago)
- Topics: arm, armv7, c, compiler, compiler-optimization, cross-compiler, elf, linux, qemu, risc-v, riscv, rv32i, rv32im, self-hosting, ssa-form
- Language: C
- Homepage:
- Size: 1.39 MB
- Stars: 1,112
- Watchers: 24
- Forks: 118
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
README
# shecc : self-hosting and educational C optimizing compiler
## Introduction
`shecc` is built from scratch, targeting both 32-bit Arm and RISC-V architectures,
as a self-compiling compiler for a subset of the C language.
Despite its simplistic nature, it is capable of performing basic optimization strategies as a standalone optimizing compiler.### Features
* Generate executable Linux ELF binaries for ARMv7-A and RV32IM.
* Provide a minimal C standard library for basic I/O on GNU/Linux.
* The cross-compiler is written in ANSI C, making it compatible with most platforms.
* Include a self-contained C front-end with an integrated machine code generator; no external assembler or linker needed.
* Utilize a two-pass compilation process: the first pass checks syntax and breaks down complex statements into basic operations,
while the second pass translates these operations into Arm/RISC-V machine code.
* Develop a register allocation system that is compatible with RISC-style architectures.
* Implement an architecture-independent, [static single assignment](https://en.wikipedia.org/wiki/Static_single-assignment_form) (SSA)-based middle-end for enhanced optimizations.## Compatibility
`shecc` is capable of compiling C source files written in the following
syntax:
* data types: char, int, struct, and pointer
* condition statements: if, while, for, switch, case, break, return, and
general expressions
* compound assignments: `+=`, `-=`, `*=`
* global/local variable initializations for supported data types
- e.g. `int i = [expr]`
* limited support for preprocessor directives: `#define`, `#ifdef`, `#elif`, `#endif`, `#undef`, and `#error`
* non-nested variadic macros with `__VA_ARGS__` identifierThe backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3,
and also supports RISC-V 32-bit architecture, verified with QEMU.## Bootstrapping
The steps to validate `shecc` bootstrapping:
1. `stage0`: `shecc` source code is initially compiled using an ordinary compiler
which generates a native executable. The generated compiler can be used as a
cross-compiler.
2. `stage1`: The built binary reads its own source code as input and generates an
ARMv7-A/RV32IM binary.
3. `stage2`: The generated ARMv7-A/RV32IM binary is invoked (via QEMU or running on
Arm and RISC-V devices) with its own source code as input and generates another
ARMv7-A/RV32IM binary.
4. `bootstrap`: Build the `stage1` and `stage2` compilers, and verify that they are
byte-wise identical. If so, `shecc` can compile its own source code and produce
new versions of that same program.## Prerequisites
Code generator in `shecc` does not rely on external utilities. You only need
ordinary C compilers such as `gcc` and `clang`. However, `shecc` would bootstrap
itself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user
emulation on GNU/Linux:
```shell
$ sudo apt-get install qemu-user
```It is still possible to build `shecc` on macOS or Microsoft Windows. However,
the second stage bootstrapping would fail due to `qemu-arm` absence.To execute the snapshot test, install the packages below:
```shell
$ sudo apt-get install graphviz jq
```## Build and Verify
Configure which backend you want, `shecc` supports ARMv7-A and RV32IM backend:
```
$ make config ARCH=arm
# Target machine code switch to Arm$ make config ARCH=riscv
# Target machine code switch to RISC-V
```Run `make` and you should see this:
```
CC+LD out/inliner
GEN out/libc.inc
CC out/src/main.o
LD out/shecc
SHECC out/shecc-stage1.elf
SHECC out/shecc-stage2.elf
```File `out/shecc` is the first stage compiler. Its usage:
```shell
$ shecc [-o output] [+m] [--no-libc] [--dump-ir]
```Compiler options:
- `-o` : Specify output file name (default: `out.elf`)
- `+m` : Use hardware multiplication/division instructions (default: disabled)
- `--no-libc` : Exclude embedded C library (default: embedded)
- `--dump-ir` : Dump intermediate representation (IR)Example:
```shell
$ out/shecc -o fib tests/fib.c
$ chmod +x fib
$ qemu-arm fib
```Verify that the emitted IRs are identical to the snapshots by specifying `check-snapshots` target when invoking `make`:
```shell
$ make check-snapshots
````shecc` comes with unit tests. To run the tests, give `check` as an argument:
```shell
$ make check
```Reference output:
```
...
int main(int argc, int argv) { exit(sizeof(char)); } => 1
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } => 10
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } => 10
OK
```To clean up the generated compiler files, execute the command `make clean`.
For resetting architecture configurations, use the command `make distclean`.## Intermediate Representation
Once the option `--dump-ir` is passed to `shecc`, the intermediate representation (IR)
will be generated. Take the file `tests/fib.c` for example. It consists of a recursive
Fibonacci sequence function.
```c
int fib(int n)
{
if (n == 0)
return 0;
else if (n == 1)
return 1;
return fib(n - 1) + fib(n - 2);
}
```Execute the following to generate IR:
```shell
$ out/shecc --dump-ir -o fib tests/fib.c
```Line-by-line explanation between C source and IR:
```c
C Source IR Explanation
------------------+---------------------------------------+----------------------------------------------------------------------------------int fib(int n) def int @fib(int %n) Indicate a function definition
{ {
if (n == 0) const %.t1001, $0 Load constant 0 into a temporary variable ".t1001"
%.t1002 = eq %n, %.t1001 Test "n" equals ".t1001" or not, and write the result in temporary variable ".t1002"
br %.t1002, .label.1177, .label.1178 If ".t1002" equals zero, goto false label ".label.1178", otherwise,
goto true label ".label.1177"
.label.1177
return 0; const %.t1003, $0 Load constant 0 into a temporary variable ".t1003"
ret %.t1003 Return ".t1003"
j .label.1184 Jump to endif label ".label.1184"
.label.1178
else if (n == 1) const %.t1004, $1 Load constant 1 into a temporary variable ".t1004"
%.t1005 = eq %n, %.t1004 Test "n" equals ".t1004" or not, and write the result in temporary variable ".t1005"
br %.t1005, .label.1183, .label.1184 If ".t1005" equals zero, goto false label ".label.1184". Otherwise,
goto true label ".label.1183"
.label.1183
return 1; const %.t1006, $1 Load constant 1 into a temporary variable ".t1006"
ret %.t1006 Return ".t1006"
.label.1184
return
fib(n - 1) const %.t1007, $1 Load constant 1 into a temporary variable ".t1007"
%.t1008 = sub %n, %.t1007 Subtract ".t1007" from "n", and store the result in temporary variable ".t1008"
push %.t1008 Prepare parameter for function call
call @fib, 1 Call function "fib" with one parameter
+ retval %.t1009 Store return value in temporary variable ".t1009"
fib(n - 2); const %.t1010, $2 Load constant 2 into a temporary variable ".t1010"
%.t1011 = sub %n, %.t1010 Subtract ".t1010" from "n", and store the result in temporary variable ".t1011"
push %.t1011 Prepare parameter for function call
call @fib, 1 Call function "fib" with one parameter
retval %.t1012 Store return value in temporary variable ".t1012"
%.t1013 = add %.t1009, %.t1012 Add ".t1009" and ".t1012", and store the result in temporary variable ".t1013"
ret %.t1013 Return ".t1013"
} }
```## Known Issues
1. The generated ELF lacks of .bss and .rodata section
2. The support of varying number of function arguments is incomplete. No `` can be used.
Alternatively, check the implementation `printf` in source `lib/c.c` for `var_arg`.
3. The C front-end is a bit dirty because there is no effective AST.## License
`shecc` is freely redistributable under the BSD 2 clause license.
Use of this source code is governed by a BSD-style license that can be found in the `LICENSE` file.