This project takes RISC-V extensions defined in the RISC-V Unified Database (UDB) to fully functional QEMU frontends along with per-instruction edge case tests. The end goal is to allow for rapid prototyping and early bug-catching of RISC-V extensions currently in development.
NOTE: Currently assumptions are made that Xqci/Xqccmp extensions are being used as input, these will be relaxed over time.
Start with
$ git submodule update --init
to fetch submodules for helper-to-tcg
, current version of QEMU xqci/xqccmp
extensions, riscv-unified-db
, along with tests (embench
, picolibc
).
Next,
$ ./build-all-artifacts.sh ${path_to_clang++_for_klee} \
${path_to_klee} \
${path_to_llvm_config}
will produce all build artifacts in the build/
directory, note a separate version of clang++
is specified for usage with KLEE which requires an older version of clang (tested with version 13 and 14). llvm-config
is forwarded for building the LLVM-based helper-to-tcg
tool which currently supports versions 10-14
inclusively.
Build artifacts are copied into the current QEMU version (submodules/xqci
) via
$ ./install-qemu.sh
which overwrites all generated files.
QEMU can be built via running
$ ./build-qemu.sh
which produces a build of qemu-riscv32, qemu-system-riscv32
into build/qemu
.
All auto-generated tests can be ran via
$ ./build-and-run-qemu-tests.sh ${path_to_toolchain_clang}
where a toolchain clang version is required for inline-assembly C
tests.
QEMU compatible instruction definitions in Tiny Code Generators (TCG) are produced by:
1. Generating C++
code from instruction definitions in the UDB (scripts/udb-to-cpp.py
), extra C++
types and operators are defined in cpp-templates/
;
2. Producing LLVM IR
using clang
(version 10-14), from the C++
code;
3. Producing TCG using helper-to-tcg
from the LLVM IR
.
QEMU can already generate C code for decoding instructions from its own decodetree
format. Mapping of UDB instruction encodings to QEMUs decodetree
format is straight forward and carried out with the scripts/udb-to-decodetree.py
script.
In QEMU decoding for instruction execution, and decoding for disassembly is slightly different and requires two separate functions to be provided per instruction. These extra functions are generated with scripts/udb-to-trans.py
.
Lastly, some glue code needs to be generated to interface with the existing disassembler and fill out formatting information, this is generated by scripts/udb-to-disas.py
.
Mapping from UDB CSRs to QEMU CSRs is done by scripts/udb-to-csr.py
and produces code for defining/accessing CSRs along with extension and privilege mode checks.
The main idea is to rely on the KLEE symbolic execution engine to collect tests for code coverage per-instruction. If dummy-branches are inserted to check for over-/underflow in overloaded operators (cpp-templates/base-operators.h
with KLEE_INPUT
and OP_CHECK_OVERFLOW
defined), KLEE will produce tests covering these branches as well. This is the main procedure used to create edge case tests for arithmetic, load, store, and branching operations.
KLEE requires LLVM IR
as input, which is generated from scripts/udb-to-klee.py
to produce C++
along with clang++
for LLVM IR
. Running KLEE on the LLVM IR
produces tests for coverage, and running these tests produces a YAML
file of expected inputs/outputs per instruction, which are later used to produce raw binary tests using scripts/assemble.py
and C
inline assembly tests using (scripts/c.py
), the latter requires a toolchain with assembly support to actually use.