Skip to content

Task07 Denis Sokolov ITMO#1070

Closed
DenChika wants to merge 1 commit into
GPGPUCourse:task07from
DenChika:task07
Closed

Task07 Denis Sokolov ITMO#1070
DenChika wants to merge 1 commit into
GPGPUCourse:task07from
DenChika:task07

Conversation

@DenChika
Copy link
Copy Markdown

@DenChika DenChika commented Mar 11, 2026

Локальный вывод

$ ./main_sparse_matrix_multiply
Found 2 GPUs in 0.211759 sec (OpenCL: 0.138175 sec, Vulkan: 0.0727802 sec)    
Available devices:                                                            
  Device #0: API: OpenCL. GPU. AMD Radeon(TM) Graphics (gfx902). Free memory: 3069/3137 Mb.                                 
  Device #1: API: OpenCL. CPU. AMD Ryzen 5 5500U with Radeon Graphics         . Intel(R) Corporation. Total memory: 7514 Mb.
Using device #0: API: OpenCL. GPU. AMD Radeon(TM) Graphics (gfx902). Free memory: 3069/3137 Mb.                             
Using OpenCL API...                                                                                                         
Evaluating CSR matrix nrows x ncols=1000000x1000000 with values in range [0; 1000]          
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 32], median NNZ per row=32, total NNZ=32000000...
CPU (multi-threaded via OpenMP) finished in 0.0572832 sec      
CPU effective bandwidth: 2.19343 GB/s (551.586 uint millions/s)
Kernels compilation done in 0.0779748 seconds
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.222917 10%=0.223147 median=0.224041 90%=0.306784 max=0.306784)
GPU SpMV median effective VRAM bandwidth: 0.565343 GB/s (142.831 uint millions/s)                                                                  
____________________________________________________________________________________________
Evaluating with NNZ per row in range [128; 128], median NNZ per row=128, total NNZ=128000000...
CPU (multi-threaded via OpenMP) finished in 0.506888 sec
CPU effective bandwidth: 0.874346 GB/s (230.939 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.420493 10%=0.422547 median=0.432062 90%=0.523749 max=0.523749)
GPU SpMV median effective VRAM bandwidth: 1.12088 GB/s (296.254 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 32], median NNZ per row=17, total NNZ=16499998...
CPU (multi-threaded via OpenMP) finished in 0.036304 sec
CPU effective bandwidth: 1.85481 GB/s (424.948 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.222726 10%=0.222782 median=0.223535 90%=0.224793 max=0.224793)
GPU SpMV median effective VRAM bandwidth: 0.308309 GB/s (71.5771 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 128], median NNZ per row=64, total NNZ=64499934...
CPU (multi-threaded via OpenMP) finished in 0.142101 sec
CPU effective bandwidth: 1.73679 GB/s (445.96 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.229239 10%=0.229381 median=0.234751 90%=0.267188 max=0.267188)
GPU SpMV median effective VRAM bandwidth: 1.0553 GB/s (272.63 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 128], median NNZ per row=80, total NNZ=79933808...
CPU (multi-threaded via OpenMP) finished in 0.159288 sec
CPU effective bandwidth: 1.9101 GB/s (493.638 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.271417 10%=0.274579 median=0.315918 90%=0.34568 max=0.34568)
GPU SpMV median effective VRAM bandwidth: 0.96616 GB/s (250.065 uint millions/s)

Вывод Github CI

$ ./main_sparse_matrix_multiply
Found 2 GPUs in 0.0578978 sec (CUDA: 8.7642e-05 sec, OpenCL: 0.0282925 sec, Vulkan: 0.0294639 sec)
Available devices:
  Device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15994 Mb.
  Device #1: API: Vulkan. CPU. llvmpipe (LLVM 20.1.2, 256 bits). Free memory: 15994/15994 Mb.
Using device #0: API: OpenCL. CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15994 Mb.
Using OpenCL API...
Evaluating CSR matrix nrows x ncols=1000000x1000000 with values in range [0; 1000]
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 32], median NNZ per row=32, total NNZ=32000000...
CPU (multi-threaded via OpenMP) finished in 0.029[8](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:9)864 sec
CPU effective bandwidth: 4.23377 GB/s (106[9](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:10).6 uint millions/s)
Kernels compilation done in 0.129737 seconds
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - [10](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:11) values (min=0.289951 10%=0.290186 median=0.291467 90%=0.467056 max=0.467056)
GPU SpMV median effective VRAM bandwidth: 0.43456 GB/s (109.789 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [128; 128], median NNZ per row=128, total NNZ=128000000...
CPU (multi-threaded via OpenMP) finished in 0.0684338 sec
CPU effective bandwidth: 7.07314 GB/s (1869.45 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.373149 10%=0.374616 median=0.380655 90%=0.438381 max=0.438381)
GPU SpMV median effective VRAM bandwidth: 1.27225 GB/s (336.263 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 32], median NNZ per row=17, total NNZ=16499998...
CPU (multi-threaded via OpenMP) finished in 0.0143547 sec
CPU effective bandwidth: 4.79126 GB/s ([11](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:12)12.27 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.282293 10%=0.282771 median=0.28363 90%=0.298394 max=0.298394)
GPU SpMV median effective VRAM bandwidth: 0.242985 GB/s (56.4116 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; [12](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:13)8], median NNZ per row=64, total NNZ=64499934...
CPU (multi-threaded via OpenMP) finished in 0.0375956 sec
CPU effective bandwidth: 6.58347 GB/s (1700.76 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.328957 10%=0.329019 median=0.330702 90%=0.333874 max=0.333874)
GPU SpMV median effective VRAM bandwidth: 0.749107 GB/s (193.528 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 128], median NNZ per row=80, total NNZ=8001[14](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:15)95...
CPU (multi-threaded via OpenMP) finished in 0.0470027 sec
CPU effective bandwidth: 6.49465 GB/s (1700.6 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.34371 10%=0.344303 median=0.345997 90%=0.349866 max=0.349866)
GPU SpMV median effective VRAM bandwidth: 0.883003 GB/s (231.2[16](https://github.com/GPGPUCourse/GPGPUTasks2025/actions/runs/22972272745/job/66691828238#step:16:17) uint millions/s)

@GPUcourseBOT
Copy link
Copy Markdown
Collaborator

Результаты тестирования PR #1070

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_sparse_matrix_multiply ===
=== main_sparse_matrix_multiply stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.57898 sec (CUDA: 0.115557 sec, OpenCL: 0.706491 sec, Vulkan: 7.75687 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
Evaluating CSR matrix nrows x ncols=1000000x1000000 with values in range [0; 1000]
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 32], median NNZ per row=32, total NNZ=32000000...
CPU (multi-threaded via OpenMP) finished in 0.0432619 sec
CPU effective bandwidth: 2.92585 GB/s (739.182 uint millions/s)
Kernels compilation done in 3.49236 seconds
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0130587 10%=0.0130666 median=0.0292415 90%=3.52165 max=3.52165)
GPU SpMV median effective VRAM bandwidth: 4.33151 GB/s (1094.34 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [128; 128], median NNZ per row=128, total NNZ=128000000...
CPU (multi-threaded via OpenMP) finished in 0.167461 sec
CPU effective bandwidth: 2.89139 GB/s (764.205 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0189664 10%=0.0189897 median=0.0192241 90%=0.0325111 max=0.0325111)
GPU SpMV median effective VRAM bandwidth: 25.1916 GB/s (6658.3 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 32], median NNZ per row=17, total NNZ=16499998...
CPU (multi-threaded via OpenMP) finished in 0.0237571 sec
CPU effective bandwidth: 2.89746 GB/s (672.649 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.01152 10%=0.0115225 median=0.0115239 90%=0.0115559 max=0.0115559)
GPU SpMV median effective VRAM bandwidth: 5.98045 GB/s (1388.42 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [1; 128], median NNZ per row=64, total NNZ=64499934...
CPU (multi-threaded via OpenMP) finished in 0.0830694 sec
CPU effective bandwidth: 2.98108 GB/s (770.134 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0144483 10%=0.0145255 median=0.030362 90%=0.030388 max=0.030388)
GPU SpMV median effective VRAM bandwidth: 8.15927 GB/s (2107.9 uint millions/s)
____________________________________________________________________________________________
Evaluating with NNZ per row in range [32; 128], median NNZ per row=80, total NNZ=80011495...
CPU (multi-threaded via OpenMP) finished in 0.104959 sec
CPU effective bandwidth: 2.90984 GB/s (761.936 uint millions/s)
GPU SpMV (sparse matrix-vector multiplication) times (in seconds) - 10 values (min=0.0154801 10%=0.0156685 median=0.0308816 90%=0.0309165 max=0.0309165)
GPU SpMV median effective VRAM bandwidth: 9.89317 GB/s (2590.54 uint millions/s)

Посмотреть полные логи

@PolarNick239
Copy link
Copy Markdown
Member

4/5 баллов 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants