-
Notifications
You must be signed in to change notification settings - Fork 244
Open
Labels
good first issueGood for newcomersGood for newcomersperformanceHow fast can we go?How fast can we go?
Description
Describe the bug
Stacking arrays of CuArrays is slow.
To reproduce
The Minimal Working Example (MWE) for this bug:
using BenchmarkTools, CUDA;
N=100;
M=1000;
x=randn(N);
x_cu=cu(x);
@btime stack(fill($x,M));
@btime stack(fill($x_cu,M));
@btime cu(stack(fill(collect($x_cu),M)));
As timing I am getting:
70.800 μs (3 allocations: 789.23 KiB)
15.774 ms (8 allocations: 8.19 KiB)
318.900 μs (12 allocations: 399.83 KiB)
Manifest.toml
CUDA v5.1.2
Version info
Details on Julia: 1.10
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Details on CUDA:
CUDA runtime 12.3, artifact installation
CUDA driver 12.0
Unknown NVIDIA driver
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: missing
Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device:
0: NVIDIA GeForce MX150 (sm_61, 1.491 GiB / 2.000 GiB available)
pcarlip
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomersperformanceHow fast can we go?How fast can we go?