Skip to content

Commit 10c9baa

Browse files
committed
[CI][Benchmarks] Benches OS setup guide
1 parent 46b04db commit 10c9baa

File tree

2 files changed

+126
-0
lines changed

2 files changed

+126
-0
lines changed
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# System Performance Tuning Guide
2+
3+
This guide provides recommendations for optimizing system performance when running SYCL and Unified Runtime benchmarks.
4+
For framework-specific information, see [README.md](README.md) and [CONTRIB.md](CONTRIB.md).
5+
6+
## Table of Contents
7+
8+
- [Overview](#overview)
9+
- [System Configuration](#system-configuration)
10+
- [CPU Tuning](#cpu-tuning)
11+
- [GPU Configuration](#gpu-configuration)
12+
- [Driver and Runtime Optimization](#driver-and-runtime-optimization)
13+
- [Environment Variables](#environment-variables)
14+
15+
## Overview
16+
17+
Performance benchmarking requires a stable and optimized system environment to produce reliable and reproducible results. This guide covers essential system tuning steps for reducing run-to-run variance in benchmark results and enabling consistent results across different runs.
18+
19+
## System Configuration
20+
21+
### Kernel Parameters
22+
23+
Add the following to `/etc/default/grub` in `GRUB_CMDLINE_LINUX`:
24+
25+
```bash
26+
# Disable CPU frequency scaling
27+
intel_pstate=disable
28+
29+
# Isolate CPUs for benchmark workloads (example: reserve cores 2-7), preventing other processes
30+
# from using them.
31+
isolcpus=2-7
32+
33+
# Set CPU governor to performance
34+
sudo cpupower frequency-set --governor performance
35+
36+
# Check current governor
37+
sudo cpupower frequency-info
38+
39+
# Example complete line:
40+
GRUB_CMDLINE_LINUX="intel_pstate=disable isolcpus=2-7"
41+
```
42+
43+
Update GRUB and reboot:
44+
```bash
45+
sudo update-grub
46+
sudo reboot
47+
```
48+
49+
## CPU Tuning
50+
51+
### CPU Frequency Scaling
52+
53+
The performance governor ensures that the CPU runs at maximum frequency.
54+
```bash
55+
# Set performance governor for all CPUs
56+
sudo cpupower frequency-set --governor performance
57+
```
58+
59+
### CPU Affinity
60+
61+
Bind benchmark processes to specific CPU cores to reduce context switching and improve cache locality.
62+
Make sure that isolated CPUs are located on the same NUMA node as the GPU being used.
63+
```bash
64+
# Run benchmark on specific CPU cores
65+
taskset -c 2-7 ./main.py ~/benchmarks_workdir/ --sycl ~/llvm/build/
66+
```
67+
68+
## GPU Configuration
69+
70+
### GPU Frequency Control
71+
Setting the GPU to run at maximum frequency can significantly improve benchmark performance and stability.
72+
73+
First, find which card relates to the GPU you want to tune (e.g., card1). List of known Device IDs for
74+
Intel GPU cards can be found at https://dgpu-docs.intel.com/devices/hardware-table.html#gpus-with-supported-drivers.
75+
```bash
76+
# Print card1 Device ID
77+
cat /sys/class/drm/card1/device/vendor # Should be 0x8086 for Intel
78+
cat /sys/class/drm/card1/device/device # Device ID
79+
```
80+
81+
Verify the max frequency is set to the true max. For Arc B580, the maximum frequency is 2850 MHz. To see this value, run “cat /sys/class/drm/card1/device/tile0/gt0/freq0/max_freq”. If the above value is not equal to the max frequency, set it as such:
82+
```bash
83+
# Arc B580 (Battlemage)
84+
echo 2850 > /sys/class/drm/card1/device/tile0/gt0/freq0/max_freq
85+
86+
# Set the min frequency to the max frequency, so it is fixed
87+
echo 2850 > /sys/class/drm/card1/device/tile0/gt0/freq0/min_freq
88+
```
89+
90+
```bash
91+
# Check GPU frequencies for GPU Max 1100 (Ponte Vecchio)
92+
cat /sys/class/drm/card1/gt_max_freq_mhz
93+
cat /sys/class/drm/card1/gt_min_freq_mhz
94+
95+
# Set maximum GPU frequency
96+
max_freq=$(cat /sys/class/drm/card1/gt_max_freq_mhz)
97+
echo $max_freq | sudo tee /sys/class/drm/card1/gt_min_freq_mhz
98+
```
99+
100+
The result can be verified using tools such as oneprof or unitrace to track frequency over time for some arbitrary benchmark (many iterations of a small problem size is recommended). The frequency should remain fixed assuming thermal throttling does not occur.
101+
102+
## Driver version
103+
Make sure you are using the latest driver (Ubuntu)
104+
```bash
105+
sudo apt update && sudo apt upgrade
106+
```
107+
108+
## Environment Variables
109+
110+
### Level Zero Environment Variables
111+
Use GPU affinity to bind benchmarks to a specific GPU. Use CPUs from the same NUMA node as the GPU to reduce latency.
112+
```bash
113+
export ZE_AFFINITY_MASK=0
114+
```
115+
116+
### SYCL Runtime Variables
117+
For consistency, limit available devices to a specific gpu runtime. For Level Zero, it is recommended to use v2 version of the runtime library.
118+
```bash
119+
export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
120+
export SYCL_UR_USE_LEVEL_ZERO_V2=1
121+
```

devops/scripts/benchmarks/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,11 @@ IGC (Ubuntu):
143143
`$ sudo apt-get install flex bison libz-dev cmake libc6 libstdc++6 python3-pip`
144144

145145

146+
## Performance Tuning
147+
148+
For stable benchmark results and system configuration recommendations, see the
149+
[Performance Tuning Guide](PERFORMANCE_TUNING.md).
150+
146151
## Contribution
147152

148153
The requirements and instructions above are for building the project from source

0 commit comments

Comments
 (0)