An open API service indexing awesome lists of open source software.

https://github.com/revng/udb-to-qemu


https://github.com/revng/udb-to-qemu

Last synced: 9 months ago
JSON representation

Awesome Lists containing this project

README

          

# udb-to-qemu

This project takes RISC-V extensions defined in the [RISC-V Unified Database (UDB)](https://github.com/riscv-software-src/riscv-unified-db/) to fully functional QEMU frontends along with per-instruction edge case tests. The end goal is to allow for rapid prototyping and early bug-catching of RISC-V extensions currently in development.

**NOTE**: Currently assumptions are made that Xqci/Xqccmp extensions are being used as input, these will be relaxed over time.

## Usage

Start with
```
$ git submodule update --init
```
to fetch submodules for `helper-to-tcg`, current version of QEMU `xqci/xqccmp` extensions, `riscv-unified-db`, along with tests (`embench`, `picolibc`).

Next,
```
$ ./build-all-artifacts.sh ${path_to_clang++_for_klee} \
${path_to_klee} \
${path_to_llvm_config}
```
will produce all build artifacts in the `build/` directory, note a separate version of `clang++` is specified for usage with KLEE which requires an older version of clang (tested with version 13 and 14). `llvm-config` is forwarded for building the LLVM-based `helper-to-tcg` tool which currently supports versions `10-14` inclusively.

Build artifacts are copied into the current QEMU version (`submodules/xqci`) via
```
$ ./install-qemu.sh
```
which overwrites all generated files.

QEMU can be built via running
```
$ ./build-qemu.sh
```
which produces a build of `qemu-riscv32, qemu-system-riscv32` into `build/qemu`.

All auto-generated tests can be ran via
```
$ ./build-and-run-qemu-tests.sh ${path_to_toolchain_clang}
```
where a toolchain clang version is required for inline-assembly `C` tests.

## Overview of Generated Artifacts

### Instruction Definitions

QEMU compatible instruction definitions in Tiny Code Generators (TCG) are produced by:
1. Generating `C++` code from instruction definitions in the UDB (`scripts/udb-to-cpp.py`), extra `C++` types and operators are defined in `cpp-templates/`;
2. Producing `LLVM IR` using `clang` (version 10-14), from the `C++` code;
3. Producing TCG using `helper-to-tcg` from the `LLVM IR`.

### Instruction Decoding

QEMU can already generate C code for decoding instructions from its own `decodetree` format. Mapping of UDB instruction encodings to QEMUs `decodetree` format is straight forward and carried out with the `scripts/udb-to-decodetree.py` script.

In QEMU decoding for instruction execution, and decoding for disassembly is slightly different and requires two separate functions to be provided per instruction. These extra functions are generated with `scripts/udb-to-trans.py`.

Lastly, some glue code needs to be generated to interface with the existing disassembler and fill out formatting information, this is generated by `scripts/udb-to-disas.py`.

### Control and Status Registers (CSRs)

Mapping from UDB CSRs to QEMU CSRs is done by `scripts/udb-to-csr.py` and produces code for defining/accessing CSRs along with extension and privilege mode checks.

### Instruction Tests

The main idea is to rely on the [KLEE](https://klee-se.org/) symbolic execution engine to collect tests for code coverage per-instruction. If dummy-branches are inserted to check for over-/underflow in overloaded operators (`cpp-templates/base-operators.h` with `KLEE_INPUT` and `OP_CHECK_OVERFLOW` defined), KLEE will produce tests covering these branches as well. This is the main procedure used to create edge case tests for arithmetic, load, store, and branching operations.

KLEE requires `LLVM IR` as input, which is generated from `scripts/udb-to-klee.py` to produce `C++` along with `clang++` for `LLVM IR`. Running KLEE on the `LLVM IR` produces tests for coverage, and running these tests produces a `YAML` file of expected inputs/outputs per instruction, which are later used to produce raw binary tests using `scripts/assemble.py` and `C` inline assembly tests using (`scripts/c.py`), the latter requires a toolchain with assembly support to actually use.