This is a simple microbenchmark that calculates Atomic Read-Modify-Write (RMW) throughput. It allows you to configure various parameters such as thread contention, padding, RMW iterations, and the number of workgroups. You can also generate a heatmap of various combinations of contention and padding with the scripts provided. The program will output the atomic throughput (measured in atomic operations per microsecond), the duration (in microseconds), and a count of any kernel computation errors (check kernel validation function).
Before you begin, ensure you have setup the following requirements:
- Vulkan:
- Vulkan SDK
- clspv
- Android ADB / NDK
makeutility for building- Python
To install the necessary dependencies, follow these steps:
-
Clone the repository:
git clone https://github.com/ucsc-chpl/gpu-atomic-rmw-microbenchmark.git
-
Build easyvk:
cd gpu-atomic-rmw-microbenchmark/easyvk/ git submodule update --init --recursive make
To run the microbenchmark on your device, follow the series of commands:
-
Navigate to the source directory:
cd src/ -
Compile the project:
make
To test a single configuration of the microbenchmark:
-
Run the microbenchmark from the
srcdirectory:./atomic_rmw_test -w <workgroups> -d <device> -c <contention> -p <padding> -i <rmw_iterations>
-w <workgroups>: (Required) The number of workgroups to use. Defaults to 1.-d <device>: (Optional) The index of the device to use. Defaults to 0.-c <contention>: (Optional) The number of threads contending on the same machine word. Defaults to 1.-p <padding>: (Optional) The number of machine words between those accessed. Defaults to 1.-i <rmw_iterations>: (Optional) The number of RMW iterations. Defaults to 128.
To test multiple configurations of thread contention and padding and produce a heatmap:
-
Run the bash script from the
srcdirectory:./heatmap_results.sh <workgroups> <device> <rmw_iterations>
-
Generate a heatmap displaying results from the
srcdirectory:python3 heatmap_generator.py
-
Example heatmap generation:
For a NVIDIA Geforce RTX 4070, the following parameters work well:
workgroups: 46rmw_iterations: 4096
-
Navigate to the source directory:
cd src/ -
Compile the project for Android:
make android
-
Get the serial number of the connected Android device:
adb devices
-
Get supported CPU ABIs:
adb -s [SERIAL_NUMBER] shell getprop ro.product.cpu.abilist # If Android is pre-Lollipop version, use: adb -s [SERIAL_NUMBER] shell getprop ro.product.cpu.abi -
Copy necessary files:
cp *.cinit *.sh build/android/obj/local/[SUPPORTED_CPU]
-
Push files to the Android device:
adb -s [SERIAL_NUMBER] push build/android/obj/local/[SUPPORTED_CPU]/ /data/local/tmp/rmw
-
Navigate to microbenchmark on the Android device:
adb -s [SERIAL_NUMBER] shell cd /data/local/tmp/rmw/[SUPPORTED_CPU]
To test a single configuration of the microbenchmark:
-
Run the microbenchmark from the
[SUPPORTED_CPU]directory:./atomic_rmw_test -w <workgroups> -d <device> -c <contention> -p <padding> -i <rmw_iterations>
-w <workgroups>: (Required) The number of workgroups to use. Defaults to 1.-d <device>: (Optional) The index of the device to use. Defaults to 0.-c <contention>: (Optional) The number of threads contending on the same machine word. Defaults to 1.-p <padding>: (Optional) The number of machine words between those accessed. Defaults to 1.-i <rmw_iterations>: (Optional) The number of RMW iterations. Defaults to 128.
To test multiple configurations of thread contention and padding and produce a heatmap:
-
Run the bash script from the
[SUPPORTED_CPU]directory:sh heatmap_results.sh <workgroups> <device> <rmw_iterations>
-
Exit the shell and pull the results file from the Android device:
exit adb -s [SERIAL_NUMBER] pull /data/local/tmp/rmw/[SUPPORTED_CPU]/result.txt .
-
Generate a heatmap displaying results from the
srcdirectory:python3 heatmap_generator.py
-
Example heatmap generation:
For a Samsung Xclipse 920, the following parameters work well:
workgroups: 3rmw_iterations: 32768