{"id":13628767,"url":"https://github.com/sysprog21/shecc","last_synced_at":"2025-05-14T16:12:26.594Z","repository":{"id":40615031,"uuid":"291543462","full_name":"sysprog21/shecc","owner":"sysprog21","description":"A self-hosting and educational C optimizing compiler","archived":false,"fork":false,"pushed_at":"2025-05-10T16:30:16.000Z","size":2333,"stargazers_count":1220,"open_issues_count":9,"forks_count":129,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-05-10T16:35:04.085Z","etag":null,"topics":["arm","armv7","c","compiler","compiler-optimization","cross-compiler","elf","linux","qemu","risc-v","riscv","rv32i","rv32im","self-hosting","ssa-form"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sysprog21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-08-30T20:06:30.000Z","updated_at":"2025-05-10T16:30:18.000Z","dependencies_parsed_at":"2024-08-14T20:54:10.706Z","dependency_job_id":"caddbdac-cd02-4292-938f-da30166a9ded","html_url":"https://github.com/sysprog21/shecc","commit_stats":{"total_commits":223,"total_committers":23,"mean_commits":9.695652173913043,"dds":0.5874439461883407,"last_synced_commit":"07adb41c08482b57f9e2266d0b22af1e8929e1ea"},"previous_names":["jserv/shecc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fshecc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fshecc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fshecc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fshecc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sysprog21","download_url":"https://codeload.github.com/sysprog21/shecc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254179905,"owners_count":22027884,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arm","armv7","c","compiler","compiler-optimization","cross-compiler","elf","linux","qemu","risc-v","riscv","rv32i","rv32im","self-hosting","ssa-form"],"created_at":"2024-08-01T22:00:57.163Z","updated_at":"2025-05-14T16:12:26.586Z","avatar_url":"https://github.com/sysprog21.png","language":"C","readme":"# shecc : self-hosting and educational C optimizing compiler\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/18013815/91671374-b2f0db00-eb58-11ea-8d55-858e9fb160c0.png\" alt=\"logo image\" width=40%\u003e\u003c/p\u003e\n\n## Introduction\n\n`shecc` is built from scratch, targeting both 32-bit Arm and RISC-V architectures,\nas a self-compiling compiler for a subset of the C language.\nDespite its simplistic nature, it is capable of performing basic optimization strategies as a standalone optimizing compiler.\n\n### Features\n\n* Generate executable Linux ELF binaries for ARMv7-A and RV32IM.\n* Provide a minimal C standard library for basic I/O on GNU/Linux.\n* The cross-compiler is written in ANSI C, making it compatible with most platforms.\n* Include a self-contained C front-end with an integrated machine code generator; no external assembler or linker needed.\n* Utilize a two-pass compilation process: the first pass checks syntax and breaks down complex statements into basic operations,\n  while the second pass translates these operations into Arm/RISC-V machine code.\n* Develop a register allocation system that is compatible with RISC-style architectures.\n* Implement an architecture-independent, [static single assignment](https://en.wikipedia.org/wiki/Static_single-assignment_form) (SSA)-based middle-end for enhanced optimizations.\n\n## Compatibility\n\n`shecc` is capable of compiling C source files written in the following\nsyntax:\n* data types: char, int, struct, and pointer\n* condition statements: if, while, for, switch, case, break, return, and\n                        general expressions\n* compound assignments: `+=`, `-=`, `*=`\n* global/local variable initializations for supported data types\n    - e.g. `int i = [expr]`\n* limited support for preprocessor directives: `#define`, `#ifdef`, `#elif`, `#endif`, `#undef`, and `#error`\n* non-nested variadic macros with `__VA_ARGS__` identifier\n\nThe backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3,\nand also supports RISC-V 32-bit architecture, verified with QEMU.\n\n## Bootstrapping\n\nThe steps to validate `shecc` bootstrapping:\n1. `stage0`: `shecc` source code is initially compiled using an ordinary compiler\n   which generates a native executable. The generated compiler can be used as a\n   cross-compiler.\n2. `stage1`: The built binary reads its own source code as input and generates an\n   ARMv7-A/RV32IM  binary.\n3. `stage2`: The generated ARMv7-A/RV32IM binary is invoked (via QEMU or running on\n   Arm and RISC-V devices) with its own source code as input and generates another\n   ARMv7-A/RV32IM binary.\n4. `bootstrap`: Build the `stage1` and `stage2` compilers, and verify that they are\n   byte-wise identical. If so, `shecc` can compile its own source code and produce\n   new versions of that same program.\n\n## Prerequisites\n\nCode generator in `shecc` does not rely on external utilities. You only need\nordinary C compilers such as `gcc` and `clang`. However, `shecc` would bootstrap\nitself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user\nemulation on GNU/Linux:\n```shell\n$ sudo apt-get install qemu-user\n```\n\nIt is still possible to build `shecc` on macOS or Microsoft Windows. However,\nthe second stage bootstrapping would fail due to `qemu-arm` absence.\n\nTo execute the snapshot test, install the packages below:\n```shell\n$ sudo apt-get install graphviz jq\n```\n\n## Build and Verify\n\nConfigure which backend you want, `shecc` supports ARMv7-A and RV32IM backend:\n```\n$ make config ARCH=arm\n# Target machine code switch to Arm\n\n$ make config ARCH=riscv\n# Target machine code switch to RISC-V\n```\n\nRun `make` and you should see this:\n```\n  CC+LD\tout/inliner\n  GEN\tout/libc.inc\n  CC\tout/src/main.o\n  LD\tout/shecc\n  SHECC\tout/shecc-stage1.elf\n  SHECC\tout/shecc-stage2.elf\n```\n\nFile `out/shecc` is the first stage compiler. Its usage:\n```shell\n$ shecc [-o output] [+m] [--no-libc] [--dump-ir] \u003cinfile.c\u003e\n```\n\nCompiler options:\n- `-o` : Specify output file name (default: `out.elf`)\n- `+m` : Use hardware multiplication/division instructions (default: disabled)\n- `--no-libc` : Exclude embedded C library (default: embedded)\n- `--dump-ir` : Dump intermediate representation (IR)\n\nExample:\n```shell\n$ out/shecc -o fib tests/fib.c\n$ chmod +x fib\n$ qemu-arm fib\n```\n\n### IR Regression Tests\n\nTo ensure the consistency of frontend (lexer, parser) behavior when working on it, the snapshot test is introduced.\nThe snapshot test dumps IRs from the executable and compares the structural identity with the provided snapshots.\n\nVerify the emitted IRs by specifying `check-snapshots` target when invoking `make`:\n```shell\n$ make check-snapshots\n```\n\nIf the compiler frontend is updated, the emitted IRs might be changed.\nThus, you can update snapshots by specifying `update-snapshots` target when invoking `make`:\n```shell\n$ make update-snapshots\n```\n\nNotice that the above 2 targets will update all backend snapshots at once, to update/check current backend's snapshot, \nuse `update-snapshot` / `check-snapshot` instead.\n\n### Unit Tests\n\n`shecc` comes with unit tests. To run the tests, give `check` as an argument:\n```shell\n$ make check\n```\n\nReference output:\n```\n  TEST STAGE 0\n...\nint main(int argc, int argv) { exit(sizeof(char)); } =\u003e 1\nint main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } =\u003e 10\nint main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } =\u003e 10\nOK\n  TEST STAGE 2\n...\nint main(int argc, int argv) { exit(sizeof(char*)); }\nexit code =\u003e 4\noutput =\u003e \nint main(int argc, int argv) { exit(sizeof(int*)); }\nexit code =\u003e 4\noutput =\u003e \nOK\n```\n\nTo clean up the generated compiler files, execute the command `make clean`.\nFor resetting architecture configurations, use the command `make distclean`.\n\n## Intermediate Representation\n\nOnce the option `--dump-ir` is passed to `shecc`, the intermediate representation (IR)\nwill be generated. Take the file `tests/fib.c` for example. It consists of a recursive\nFibonacci sequence function.\n```c\nint fib(int n)\n{\n    if (n == 0)\n        return 0;\n    else if (n == 1)\n        return 1;\n    return fib(n - 1) + fib(n - 2);\n}\n```\n\nExecute the following to generate IR:\n```shell\n$ out/shecc --dump-ir -o fib tests/fib.c\n```\n\nLine-by-line explanation between C source and IR (variable and label numbering may differ):\n```c\nC Source                  IR                                         Explanation\n-------------------+--------------------------------------+--------------------------------------------------------------------------------------\nint fib(int n)       def int @fib(int %n)\n{                    {\n  if (n == 0)          const %.t871, 0                      Load constant 0 into a temporary variable \".t871\"\n                       %.t872 = eq %n, %.t871               Test if \"n\" is equal to \".t871\", store result in \".t872\"\n                       br %.t872, .label.1430, .label.1431  If \".t872\" is non-zero, branch to label \".label.1430\", otherwise to \".label.1431\"\n                     .label.1430:\n    return 0;          const %.t873, 0                      Load constant 0 into a temporary variable \".t873\"\n                       ret %.t873                           Return \".t873\"\n                     .label.1431:\n  else if (n == 1)     const %.t874, 1                      Load constant 1 into a temporary variable \".t874\"\n                       %.t875 = eq %n, %.t874               Test if \"n\" is equal to \".t874\", store result in \".t875\"\n                       br %.t875, .label.1434, .label.1435  If \".t875\" is true, branch to \".label.1434\", otherwise to \".label.1435\"\n                     .label.1434:\n    return 1;          const %.t876, 1                      Load constant 1 into a temporary variable \".t876\"\n                       ret %.t876                           Return \".t876\"\n                     .label.1435:\n  return fib(n - 1)    const %.t877, 1                      Load constant 1 into \".t877\"\n                       %.t878 = sub %n, %.t877              Subtract \".t877\" from \"n\", store in \".t878\"\n                       push %.t878                          Prepare argument \".t878\" for function call\n                       call @fib, 1                         Call function \"@fib\" with 1 argument\n         +             retval %.t879                        Store the return value in \".t879\"\n         fib(n - 2);   const %.t880, 2                      Load constant 2 into \".t880\"\n                       %.t881 = sub %n, %.t880              Subtract \".t880\" from \"n\", store in \".t881\"\n                       push %.t881                          Prepare argument \".t881\" for function call\n                       call @fib, 1                         Call function \"@fib\" with 1 argument\n                       retval %.t882                        Store the return value in \".t882\"\n                       %.t883 = add %.t879, %.t882          Add \".t879\" and \".t882\", store in \".t883\"\n                       ret %.t883                           Return \".t883\"\n}                    }\n```\n\n## Known Issues\n\n1. The generated ELF lacks of .bss and .rodata section\n2. The support of varying number of function arguments is incomplete. No `\u003cstdarg.h\u003e` can be used.\n   Alternatively, check the implementation `printf` in source `lib/c.c` for `var_arg`.\n3. The C front-end is a bit dirty because there is no effective AST.\n\n## License\n\n`shecc` is freely redistributable under the BSD 2 clause license.\nUse of this source code is governed by a BSD-style license that can be found in the `LICENSE` file.\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsysprog21%2Fshecc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsysprog21%2Fshecc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsysprog21%2Fshecc/lists"}