- Overview
- Installation
- Getting the Ghost Threading compiler
- Benchmarking
- Usage
- Results
- Contributing
- License
This repository contains a microbenchmark to test and evaluate the performance of a compiler performing automatic Ghost Threading (GT) on modern processors. Ghost Threading is a hardware and software mechanism that allows the execution of additional threads in parallel with the main threads without interfering with the primary computation. Our automated Ghost Threading compiler can extract relevant regions of code and perform prefetching using a helper thread. This benchmark evaluates how well such mechanisms perform on both single-core and multi-core architectures, with an emphasis on the compiler for automated extraction of the source code.
The benchmark is designed to run on systems that support multi-threading and provides performance metrics such as throughput, latency, and resource utilization.
- Automatically create helper threads with prefetch instructions: The compiler automatically generates helper threads with prefetch instructions to enhance data locality.
- Algorithm to copy the original loop into the prefetching thread: The benchmark includes an algorithm that copies the original computational loop into the helper thread, minimizing it by removing all unnecessary code, ensuring that only relevant data is prefetched.
- Hyper-parameters passed as pragma: User can customize the behavior of the compiler with hyper-parameters passed directly to the pragmas, controlling the granularity and execution model of Ghost Threading.
- Scripts to compile and run kernels: The repository includes scripts to compile and execute kernels with automatic ghost threading, manual ghost threading, and baseline execution modes, facilitating easy comparison between different strategies.
- Result collection: Scripts to automatically collects performance metrics for easy analysis.
To run the microbenchmark, you will need the following tools and dependencies:
- x86-64 machine A physical system with x86-64 Intel processor which supports Hyper-Threading and has the serialize instruction. At least 94GB memory is needed to build the input graphs for some benchmarks. Around 290GB disk space is needed for the input graphs.
- Ghost Threading Compiler (Get from here)
- CMake (version 3.20+) for the compiler
- Python (version 3.10+)
- libpthread (for multi-threaded execution)
- OpenMP runtime to compare against parallel processing
You can install the required dependencies using the following commands:
sudo apt-get update
sudo apt-get install python3 llvm clang lld libpthread-stubs0-dev ninja-build cmake
python3 -m ensurepip --upgrade
pip install -r requirements.txtbrew install llvm lld python libpthread-stubs ninja cmake
python3 -m ensurepip --upgrade
pip install -r requirements.txtgit clone https://github.com/CompArchCam/GhostThreadingCompiler.gitcd GhostThreadingCompiler
mkdir build && cd build
export LLVMDIR="/llvm/install/path"
cmake -G Ninja \
-DCMAKE_INSTALL_PREFIX="${LLVMDIR}" \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_OPTIMIZED_TABLEGEN=On \
-DLLVM_ENABLE_PROJECTS="clang;lld;openmp" \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DLLVM_PARALLEL_COMPILE_JOBS=6 \
-DLLVM_PARALLEL_LINK_JOBS=4 \
-DLLVM_USE_LINKER=lld \
../llvm
ninja install
export PATH="${LLVMDIR}/bin:$PATH"Once installed, you can verify the installation by running the following command:
opt --help-hidden | grep ghostthreading
clang --versionThis should print the version of the compiler that you have installed.
Clone the repository:
git clone https://github.com/CompArchCam/ghost-threading-bmk.gitFirst set proper CPU ids (cpu_m and cpu_s) in the config.sh script.
export LLVMDIR="/llvm/install/path"
cd ghost-threading-bmk/workdir
vim config.sh
cd htpf
./build_htpf.sh -t gt -a exec -k camel -n 3export LLVMDIR="/llvm/install/path"
cd ghost-threading-bmk/workdir/gap
./build_gap.sh -t gt -a exec -k cc -g kron -n 3cd ghost-threading-bmk/workdir/htpf
./build_htpf.sh -a exec -n 3
python results.py outputcd ghost-threading-bmk/workdir/gap
./build_gap.sh -a exec -k cc -g kron -n 3
python results.py outputTo run all the kernels, do not pass -k and -g flags. This will take considerable amount of time for all the kernels and the three techniques gt (automatic ghost threading), tpf (manual ghost threading), and baseline (with -O3).
We welcome contributions from the community! If you want to improve the benchmark or add new features, follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-name). - Implement your feature or bug fix.
- Run the benchmarks and ensure all tests pass.
- Commit your changes and push them to your fork.
- Create a pull request describing your changes.