-
Notifications
You must be signed in to change notification settings - Fork 0
GandalfTea/gperf
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Launching analysis:
* daemon on instance catching all pytorch threads and deploy metrics to kernel module
-- track core metrics with ->0 overhead, inform user if big problems are found
* 'apx analyze <file.py> [argv]' attaching to pid, dumping a 'data.[apx]' and then 'apx report' to launch cli report.
-- more detailed, might run kernels multiple times to benchmark, end with 'apx report' for full report
daemon nvperfd:
* spawned either by kernel or by '<cmd> init' with comm 'nvperfd'
* 2 open sockets, netlink connector api and logs
* logs in /var/log/nvperfd/nvperfd.log under LOCAL3.*
* read nvidia tracked metrics from /home/<user>/.config/nvperf/nvperfd.conf
-- users use the repl to modify the tracked metrics and we read it when launching them
* use netlink sockets to monitor PROC_EVENT_* when new processes are launched:
-- track every python3 process (pytorch renames main process to pt_main_thread after ~0.5s)
-- track threads and check for nvidia controllers
-- if found, deploy trackers
-- keep tracking for pid until PROC_EVENT_EXIT is called, disable trackers and read data
-- dump data to state file folder
-- display simple tracking data, tell user to review with 'apx report' or smth
nvctl: nvidia CUPTI API interface
Things to track:
i. Memory movement from DRAM to VRAM:
* check if memory is page-locked (presence of cudaMallocHost)
- prompt people to use torch.Tensor.pin_memory, pin_memory=True (DataLoader) or mlock syscall.
-- https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/
* latency of pinned buffer if memory is paged
* check data spills and subsequent movements into VRAM
ii. VRAM and Cache Management:
(
* L2 Cache can be segmented to separete persistent data. Verify if this is used and what the size is. (this is disabled in multi-gpu)
-- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#l2-cache-set-aside-for-persisting-accesses
* User can set up custom L2 data persistence policy.
-- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#l2-policy-for-persisting-accesses
)
* VRAM to Shared Cache movements, evictions and stalls
* Track cache misses and update latency
iii. Core utilization and Idle Time:
iv. Application level parallelisation
v. etc.
About
gpu performance monitoring
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published