https://github.com/revng/udb-to-qemu
https://github.com/revng/udb-to-qemu
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/revng/udb-to-qemu
- Owner: revng
- License: gpl-2.0
- Created: 2025-09-09T15:03:16.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2025-09-09T21:24:56.000Z (10 months ago)
- Last Synced: 2025-09-10T00:59:45.804Z (10 months ago)
- Language: Python
- Size: 77.1 KB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# udb-to-qemu
This project takes RISC-V extensions defined in the [RISC-V Unified Database (UDB)](https://github.com/riscv-software-src/riscv-unified-db/) to fully functional QEMU frontends along with per-instruction edge case tests. The end goal is to allow for rapid prototyping and early bug-catching of RISC-V extensions currently in development.
**NOTE**: Currently assumptions are made that Xqci/Xqccmp extensions are being used as input, these will be relaxed over time.
## Usage
Start with
```
$ git submodule update --init
```
to fetch submodules for `helper-to-tcg`, current version of QEMU `xqci/xqccmp` extensions, `riscv-unified-db`, along with tests (`embench`, `picolibc`).
Next,
```
$ ./build-all-artifacts.sh ${path_to_clang++_for_klee} \
${path_to_klee} \
${path_to_llvm_config}
```
will produce all build artifacts in the `build/` directory, note a separate version of `clang++` is specified for usage with KLEE which requires an older version of clang (tested with version 13 and 14). `llvm-config` is forwarded for building the LLVM-based `helper-to-tcg` tool which currently supports versions `10-14` inclusively.
Build artifacts are copied into the current QEMU version (`submodules/xqci`) via
```
$ ./install-qemu.sh
```
which overwrites all generated files.
QEMU can be built via running
```
$ ./build-qemu.sh
```
which produces a build of `qemu-riscv32, qemu-system-riscv32` into `build/qemu`.
All auto-generated tests can be ran via
```
$ ./build-and-run-qemu-tests.sh ${path_to_toolchain_clang}
```
where a toolchain clang version is required for inline-assembly `C` tests.
## Overview of Generated Artifacts
### Instruction Definitions
QEMU compatible instruction definitions in Tiny Code Generators (TCG) are produced by:
1. Generating `C++` code from instruction definitions in the UDB (`scripts/udb-to-cpp.py`), extra `C++` types and operators are defined in `cpp-templates/`;
2. Producing `LLVM IR` using `clang` (version 10-14), from the `C++` code;
3. Producing TCG using `helper-to-tcg` from the `LLVM IR`.
### Instruction Decoding
QEMU can already generate C code for decoding instructions from its own `decodetree` format. Mapping of UDB instruction encodings to QEMUs `decodetree` format is straight forward and carried out with the `scripts/udb-to-decodetree.py` script.
In QEMU decoding for instruction execution, and decoding for disassembly is slightly different and requires two separate functions to be provided per instruction. These extra functions are generated with `scripts/udb-to-trans.py`.
Lastly, some glue code needs to be generated to interface with the existing disassembler and fill out formatting information, this is generated by `scripts/udb-to-disas.py`.
### Control and Status Registers (CSRs)
Mapping from UDB CSRs to QEMU CSRs is done by `scripts/udb-to-csr.py` and produces code for defining/accessing CSRs along with extension and privilege mode checks.
### Instruction Tests
The main idea is to rely on the [KLEE](https://klee-se.org/) symbolic execution engine to collect tests for code coverage per-instruction. If dummy-branches are inserted to check for over-/underflow in overloaded operators (`cpp-templates/base-operators.h` with `KLEE_INPUT` and `OP_CHECK_OVERFLOW` defined), KLEE will produce tests covering these branches as well. This is the main procedure used to create edge case tests for arithmetic, load, store, and branching operations.
KLEE requires `LLVM IR` as input, which is generated from `scripts/udb-to-klee.py` to produce `C++` along with `clang++` for `LLVM IR`. Running KLEE on the `LLVM IR` produces tests for coverage, and running these tests produces a `YAML` file of expected inputs/outputs per instruction, which are later used to produce raw binary tests using `scripts/assemble.py` and `C` inline assembly tests using (`scripts/c.py`), the latter requires a toolchain with assembly support to actually use.