Skip to content

Task05 Denis Sokolov ITMO (Try faster)#1072

Closed
DenChika wants to merge 3 commits into
GPGPUCourse:task05from
DenChika:task05
Closed

Task05 Denis Sokolov ITMO (Try faster)#1072
DenChika wants to merge 3 commits into
GPGPUCourse:task05from
DenChika:task05

Conversation

@DenChika
Copy link
Copy Markdown

Локальный вывод

$ ./main_radix_sort
Found 2 GPUs in 0.14853 sec (OpenCL: 0.100794 sec, Vulkan: 0.0473934 sec)     
Available devices:                                                            
  Device #0: API: OpenCL. GPU. AMD Radeon(TM) Graphics (gfx902). Free memory: 3069/3137 Mb.                                 
  Device #1: API: OpenCL. CPU. AMD Ryzen 5 5500U with Radeon Graphics         . Intel(R) Corporation. Total memory: 7514 Mb.
Using device #0: API: OpenCL. GPU. AMD Radeon(TM) Graphics (gfx902). Free memory: 3069/3137 Mb.                             
Using OpenCL API...
n=10000000 max_value=2147483647
sorting on CPU...
CPU std::sort finished in 1.08789 sec
CPU std::sort effective RAM bandwidth: 0.0684444 GB/s (9.182 uint millions/s)
Kernels compilation done in 0.0354784 seconds
Kernels compilation done in 0.0377003 seconds
Kernels compilation done in 0.0483551 seconds
Kernels compilation done in 0.0398296 seconds
Kernels compilation done in 0.0392528 seconds
GPU radix-sort times (in seconds) - 10 values (min=3.27277 10%=3.28092 median=3.45185 90%=4.44449 max=4.44449)
GPU radix-sort median effective VRAM bandwidth: 0.0215843 GB/s (2.89699 uint millions/s)

Вывод Github CI

$ ./main_radix_sort
Found 2 GPUs in 0.0560437 sec (CUDA: 7.7654e-05 sec, OpenCL: 0.0240834 sec, Vulkan: 0.0318348 sec)
Available devices:
  Device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15990 Mb.
  Device #1: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15990/15990 Mb.
Using device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15990 Mb.
Using OpenCL API...
n=100000000 max_value=21474[8](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22801577711/job/66144032660#step:15:9)3647
sorting on CPU...
CPU std::sort finished in 8.42[9](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22801577711/job/66144032660#step:15:10)25 sec
CPU std::sort effective RAM bandwidth: 0.0883892 GB/s (11.8634 uint millions/s)
Kernels compilation done in 0.121 seconds
Kernels compilation done in 0.0330141 seconds
Kernels compilation done in 0.0323024 seconds
Kernels compilation done in 0.0347719 seconds
Kernels compilation done in 0.0354877 seconds
GPU radix-sort times (in seconds) - [10](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22801577711/job/66144032660#step:15:11) values (min=61.9657 10%=61.9824 median=62.0202 90%=62.2569 max=62.2569)
GPU radix-sort median effective VRAM bandwidth: 0.0120132 GB/s (1.61238 uint millions/s)

@GPUcourseBOT
Copy link
Copy Markdown
Collaborator

Результаты тестирования PR #1072

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_radix_sort ===
=== main_radix_sort stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.49163 sec (CUDA: 0.113472 sec, OpenCL: 0.709364 sec, Vulkan: 7.66873 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
n=100000000 max_value=2147483647
sorting on CPU...
CPU std::sort finished in 11.4678 sec
CPU std::sort effective RAM bandwidth: 0.0649697 GB/s (8.72008 uint millions/s)
Kernels compilation done in 3.23027 seconds
Kernels compilation done in 0.0449591 seconds
Kernels compilation done in 0.0419443 seconds
Kernels compilation done in 0.0409702 seconds
Kernels compilation done in 0.686643 seconds
GPU radix-sort times (in seconds) - 10 values (min=2.4887 10%=2.48872 median=2.48953 90%=6.59311 max=6.59311)
GPU radix-sort median effective VRAM bandwidth: 0.299277 GB/s (40.1682 uint millions/s)

Посмотреть полные логи

@DenChika DenChika closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants