{"id":18889665,"url":"https://github.com/mausimus/rvcc","last_synced_at":"2025-04-14T23:26:41.512Z","repository":{"id":107070698,"uuid":"264419160","full_name":"mausimus/rvcc","owner":"mausimus","description":"Standalone C compiler for RISC-V and ARM","archived":false,"fork":false,"pushed_at":"2024-05-01T12:08:07.000Z","size":418,"stargazers_count":83,"open_issues_count":0,"forks_count":14,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T11:21:40.652Z","etag":null,"topics":["arm","c","compiler","risc-v"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mausimus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-16T11:11:06.000Z","updated_at":"2025-03-12T18:36:01.000Z","dependencies_parsed_at":"2024-05-01T13:36:20.258Z","dependency_job_id":"cb348d2f-0971-4326-9fb6-1c361cbaba49","html_url":"https://github.com/mausimus/rvcc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mausimus%2Frvcc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mausimus%2Frvcc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mausimus%2Frvcc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mausimus%2Frvcc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mausimus","download_url":"https://codeload.github.com/mausimus/rvcc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248976862,"owners_count":21192478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arm","c","compiler","risc-v"],"created_at":"2024-11-08T07:50:05.125Z","updated_at":"2025-04-14T23:26:41.486Z","avatar_url":"https://github.com/mausimus.png","language":"C","funding_links":[],"categories":["Compilers"],"sub_categories":["Embeddable Scripts and Languages"],"readme":"## rvcc\n\nBootstrapped C compiler for 32-bit RISC-V and ARM ISAs\n\n### Features\n\n* implements a subset of C language sufficient to compile itself\n* generates executable Linux ELF binaries for RV32IM and ARMv7\n* includes an embedded minimal Linux C standard library for basic I/O\n* single binary - a microcompiler for environments without GCC toolchain\n* written in ANSI C it can cross-compile from any platform\n* simple two-pass compilation process: source -\u003e IL -\u003e binary\n* lexer, parser and code generator all implemented by hand\n\n### Bootstrapping\n\nBootstrapped compiler is a compiler that's able to compile its own source code which is a key verification milestone. Since rvcc is written in ANSI C, we can use any compiler on any platform for the initial compilation before bootstrapping with the help of RISC-V and/or ARM emulators.\n\n![diagram](bootstrap.png)\n\nThe steps to validate rvcc's bootstrap on RISC-V are:\n\n1. Compiler's source code is initially compiled using gcc which generates an x86 binary.\n\n2. Built binary is invoked with its own source code as input and generates a RISC-V binary.\n\n3. The RISC-V binary is invoked (via an emulator) with its own source code as input and generates another RISC-V binary.\n\n4. If outputs generated in steps 2. and 3. are identical then bootstrap has been successful - the compiler regenerated itself from its own source code in two consecutive generations.\n\nThe bootstrap can be tested by running ```make bootstrap-riscv``` (requires gcc and rv8 RISC-V emulator installed):\n\n```sh\nuser@ubuntu:~/rvcc$ make bootstrap-riscv\ngcc -ansi -pedantic -m32 -Wall -Wextra -g -Llib -o bin/rvcc src/rvcc.c\nbin/rvcc -o bin/rvcc_1.elf -march=riscv -Llib src/rvcc.c\nrv-jit -- bin/rvcc_1.elf -o bin/rvcc_2.elf -march=riscv -Llib src/rvcc.c\ndiff -q bin/rvcc_1.elf bin/rvcc_2.elf\nFiles are the same - bootstrap successful!\n```\n\nBootstrapping on ARM follows the same process and the test can be run via ```make bootstrap-arm``` (requires qemu-arm-static installed). To run both bootstraps use ```make bootstrap```.\n\n### Usage\n\n`rvcc [-o outfile] [-noclib] [-march=riscv|arm] \u003cinfile.c\u003e`\n\n- -o - output file name (default: out.elf)\n- -noclib - exclude embedded C library (default: include)\n- -march=riscv|arm - output architecture (default: riscv)\n\n### Output\n\nThe compiler generates an executable binary file without going through explicit linking and assembly steps,\nit directly encodes all RISC-V/ARM opcode instructions and packages them in an ELF file.\nThe generated executable includes a symbol table so by using a disassembler it's possible to\npeek into the machine code for introspection. The compiler also generates a listing of its internal\nIL representation for debugging purposes.\n\nTo run a RISC-V ELF file on an x86 machine you can use the excellent [rv8](https://github.com/rv8-io/rv8)\nemulator, which allows running and debugging RISC-V binaries as if they were native ones,\nincluding proxying system calls for I/O. The usage is:\n\n`rv-jit \u003celf_file\u003e`\n\nTo disassemble a RISC-V binary:\n\n`rv-bin dump -d \u003celf_file\u003e`\n\nTo run ARM on x86 you can use [qemu user emulation](https://wiki.debian.org/QemuUserEmulation) through qemu-arm-static.\n\n\n`qemu-arm-static \u003celf_file\u003e`\n\nTo disassemble an ARM binary you can use the GNU toolchain:\n\n`arm-linux-gnueabi-objdump -d \u003celf_file\u003e`\n\n### Code\n\nSupported language constructs fall within ANSI C and initial external compilation can be done using gcc or any other\nC compiler. While common language constructs (structs, pointers, indirection etc.) are implemented due\nto the simplicity of the compiler (no preprocessor, AST nor a formalised parser) it doesn't implement full ANSI C syntax.\n\nCompiler sequentially translates C source code into IL and then binary meaning instruction order is preserved from C source\nall the way to the binary with only jumps and glue logic being inserted. This makes some language constructs\n(for example for loops or some comparisons) quite inefficient due to excessive jumps but compiler's design stays very simple.\n\n#### RISC-V\n\n[RISC-V](https://en.wikipedia.org/wiki/RISC-V) is an open source Instruction Set Architecture,\nan alternative to the likes of x86 and ARM, which can be used freely without any licenses required. It's a RISC\narchitecture meaning the only instructions that operate on memory are simple loads and stores, with all\nother instructions operating only on registers (32 of them) and immediate values.\n\n#### ARM\n\nARM output generates code compatible with ARMv7 ISA which is owned by [Arm Holdings](https://www.arm.com/). Similarly to RISC-V,\nARM is a RISC ISA but it offers a wider variety of operations by adding conditional execution codes, automatic index increments, \nvalue shifts etc. to many instructions resulting in potentially denser code (fewer instructions required), at the expense of reducing\nthe number of bits available for immediate values.\n\nThe compiler generates Linux ARMv7 EABI-compliant binaries which can be run directly on a Raspberry Pi (verified on Pi 4 Model B running armv7l).\n\n### Example RISC-V output\n\nSample RISC-V binary code generated for a recursive Fibonacci sequence function.\n\n```asm\n C Source            Internal IL     Binary     Disassembly               Comment\n-------------------+---------------+----------+-------------------------+--------------------------------------\n int fib(int n) {    fib(int n)      fe010113   addi  sp, sp, -32;        reserve stack space for function\n                                     00812e23   sw    s0, 28(sp);           store previous frame\n                                     00112c23   sw    ra, 24(sp);           store return address\n                                     01010413   addi  s0, sp, 16;           set new frame location\n                                     fea42e23   sw    a0, -4(s0);           store parameter on stack\n   if (n == 0)       x10 := \u0026n       00040513   addi  a0, s0, 0;          get address of variable n\n                                     ffc50513   addi  a0, a0, -4;                    \n                     x10 := *x10     00052503   lw    a0, 0(a0);          read value from address into a0\n                     x11 := 0        00000593   addi  a1, zero, 0;        set a1 to zero\n                     x10 ?= x11      00b50663   beq   a0, a1, pc + 12;    compare a0 with a1, if equal jump +3\n                                     00000513   addi  a0, zero, 0;          set a0 to zero\n                                     0080006f   jal   zero, pc + 8;         skip next instruction\n                                     00100513   addi  a0, zero, 1;          set a0 to one\n                                     00000013   addi  zero, zero, 0;                \n                     if 0 -\u003e +4      00050863   beq   a0, zero, pc + 16;  if a0 is zero, jump forward\n     return 0;       x10 := 0        00000513   addi  a0, zero, 0;        else set return value to zero \n                     return fib      0880006f   jal   zero, pc + 136;       jump to function exit\n                                     0840006f   jal   zero, pc + 132;            \n   else if (n == 1)  x10 := \u0026n       00040513   addi  a0, s0, 0;          get address of variable n\n                                     ffc50513   addi  a0, a0, -4;                   \n                     x10 := *x10     00052503   lw    a0, 0(a0);          read value from address into a0\n                     x11 := 1        00100593   addi  a1, zero, 1;        set a1 to one\n                     x10 ?= x11      00b50663   beq   a0, a1, pc + 12;    compare a0 with a1, if equal jump +3\n                                     00000513   addi  a0, zero, 0;          set a0 to zero\n                                     0080006f   jal   zero, pc + 8;         skip next instruction\n                                     00100513   addi  a0, zero, 1;          set a0 to one\n                                     00000013   addi  zero, zero, 0;                       \n                     if 0 -\u003e +4      00050863   beq   a0, zero, pc + 16;  if a0 is zero, jump forward\n     return 1;       x10 := 1        00100513   addi  a0, zero, 1;        else set return value to one\n                     return fib      0540006f   jal   zero, pc + 84;        jump to function exit\n   else return                       0500006f   jal   zero, pc + 80;                  \n     fib(n - 1)      x10 := \u0026n       00040513   addi  a0, s0, 0;          get address of variable n\n                                     ffc50513   addi  a0, a0, -4;                       \n                     x10 := *x10     00052503   lw    a0, 0(a0);          read value from address into a0\n                     x11 := 1        00100593   addi  a1, zero, 1;        set a1 to one\n                     x10 -= x11      40b50533   sub   a0, a0, a1;         subtract a1 from a0\n                     x10 := fib()    f71ff0ef   jal   ra, pc - 144;       call function fib() into a0\n     +               push x10        ff010113   addi  sp, sp, -16;        store result on stack\n                                     00a12023   sw    a0, 0(sp);                     \n     fib(n - 2)      x10 := \u0026n       00040513   addi  a0, s0, 0;          get address of variable n\n                                     ffc50513   addi  a0, a0, -4;                 \n                     x10 := *x10     00052503   lw    a0, 0(a0);          read value from address into a0\n                     x11 := 2        00200593   addi  a1, zero, 2;        set a1 to two\n                     x10 -= x11      40b50533   sub   a0, a0, a1;         subtract a1 from a0\n                     x11 := fib()    f51ff0ef   jal   ra, pc - 176;       call function fib() into a1\n                                     00050593   addi  a1, a0, 0;                              \n                     pop x10         00012503   lw    a0, 0(sp);          retrieve result off stack into a0\n                                     01010113   addi  sp, sp, 16;                       \n                     x10 += x11      00b50533   add   a0, a0, a1;         add a1 to a0\n     ;               return fib      0040006f   jal   zero, pc + 4;       jump to function exit\n }                   exit fib        01040113   addi  sp, s0, 16;         trim stack space\n                                     ff812083   lw    ra, -8(sp);         recover return address\n                                     ffc12403   lw    s0, -4(sp);         recover previous frame\n                                     00008067   jalr  zero, ra, 0;        return from function\n```\n\n### Thanks\n\nAim of the project is purely recreational/educational and thanks are due to:\n\n* [RISC-V International](https://riscv.org/) for the RISC-V initiative\n\n* K\u0026R for giving the world C language (here we are 40 years later...)\n\n* [rv8](https://github.com/rv8-io/rv8) and [qemu](https://www.qemu.org/) teams for a great emulation environments\n\n* everyone else who supports RISC architectures!\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmausimus%2Frvcc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmausimus%2Frvcc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmausimus%2Frvcc/lists"}