-
Notifications
You must be signed in to change notification settings - Fork 47
Test GPUArrays reverse
#648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/mtl/capture.jl b/lib/mtl/capture.jl
index c2c1a77a..c101c5b7 100644
--- a/lib/mtl/capture.jl
+++ b/lib/mtl/capture.jl
@@ -59,7 +59,8 @@ function MTLCaptureDescriptor()
end
# TODO: Add capture state
-function MTLCaptureDescriptor(obj::Union{MTLDevice,MTLCommandQueue,MTLCaptureScope},
+function MTLCaptureDescriptor(
+ obj::Union{MTLDevice, MTLCommandQueue, MTLCaptureScope},
destination::MTLCaptureDestination;
folder::String=nothing)
desc = MTLCaptureDescriptor()
@@ -110,7 +111,8 @@ end
Start GPU frame capture using the default capture object and specifying capture descriptor parameters directly.
"""
-function startCapture(obj::Union{MTLDevice,MTLCommandQueue,MTLCaptureScope},
+function startCapture(
+ obj::Union{MTLDevice, MTLCommandQueue, MTLCaptureScope},
destination::MTLCaptureDestination=MTLCaptureDestinationGPUTraceDocument;
folder::String=nothing)
if destination == MTLCaptureDestinationGPUTraceDocument && folder === nothing
diff --git a/perf/array.jl b/perf/array.jl
index 008ab4d6..b86a675e 100644
--- a/perf/array.jl
+++ b/perf/array.jl
@@ -63,12 +63,12 @@ gpu_vec_ints = reshape(gpu_mat_ints, length(gpu_mat_ints))
let group = addgroup!(group, "reverse")
group["1d"] = @benchmarkable Metal.@sync reverse($gpu_vec)
group["1dL"] = @benchmarkable Metal.@sync reverse($gpu_vec_long)
- group["2d"] = @benchmarkable Metal.@sync reverse($gpu_mat; dims=1)
- group["2dL"] = @benchmarkable Metal.@sync reverse($gpu_mat_long; dims=1)
+ group["2d"] = @benchmarkable Metal.@sync reverse($gpu_mat; dims = 1)
+ group["2dL"] = @benchmarkable Metal.@sync reverse($gpu_mat_long; dims = 1)
group["1d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_vec)
group["1dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_vec_long)
- group["2d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat; dims=1)
- group["2dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat_long; dims=2)
+ group["2d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat; dims = 1)
+ group["2dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat_long; dims = 2)
end
# 'evals=1' added to prevent hang when running benchmarks of CI
diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index 17bf4ea0..98aa3153 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,7 +1,7 @@
# benchmark suite execution and codespeed submission
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="reverse")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "reverse")
using Metal
diff --git a/test/runtests.jl b/test/runtests.jl
index 081fc280..42f00908 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -1,5 +1,5 @@
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="reverse")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "reverse")
using Distributed
using Dates |
4c15cc1 to
108f6d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
| Benchmark suite | Current: 36bd61e | Previous: cd8846e | Ratio |
|---|---|---|---|
latency/precompile |
28688656917 ns |
25041664500 ns |
1.15 |
latency/ttfp |
2291530833 ns |
2123110083 ns |
1.08 |
latency/import |
1381234520.5 ns |
1219920625 ns |
1.13 |
integration/metaldevrt |
840667 ns |
833750 ns |
1.01 |
integration/byval/slices=1 |
1555083 ns |
1545000 ns |
1.01 |
integration/byval/slices=3 |
8878646 ns |
9534208 ns |
0.93 |
integration/byval/reference |
1538083 ns |
1538458 ns |
1.00 |
integration/byval/slices=2 |
2615499.5 ns |
2567417 ns |
1.02 |
kernel/indexing |
627458 ns |
570625 ns |
1.10 |
kernel/indexing_checked |
633229 ns |
587875 ns |
1.08 |
kernel/launch |
12375 ns |
12250 ns |
1.01 |
kernel/rand |
568167 ns |
559417 ns |
1.02 |
array/reverse/1d |
632875 ns |
||
array/reverse/2dL_inplace |
2500979 ns |
||
array/reverse/1dL |
2114333.5 ns |
||
array/reverse/2d |
1346708 ns |
||
array/reverse/1d_inplace |
577000 ns |
||
array/reverse/2d_inplace |
809208 ns |
||
array/reverse/2dL |
6548417 ns |
||
array/reverse/1dL_inplace |
863000 ns |
||
array/construct |
6333 ns |
6250 ns |
1.01 |
array/broadcast |
595792 ns |
568853.5 ns |
1.05 |
array/accumulate/Int64/1d |
1321375 ns |
1252541.5 ns |
1.05 |
array/accumulate/Int64/dims=1 |
1916729.5 ns |
1812125 ns |
1.06 |
array/accumulate/Int64/dims=2 |
2272167 ns |
2154541.5 ns |
1.05 |
array/accumulate/Int64/dims=1L |
11932167 ns |
11676250 ns |
1.02 |
array/accumulate/Int64/dims=2L |
10049604 ns |
9788583 ns |
1.03 |
array/accumulate/Float32/1d |
1162333.5 ns |
1110125 ns |
1.05 |
array/accumulate/Float32/dims=1 |
1661541.5 ns |
1542584 ns |
1.08 |
array/accumulate/Float32/dims=2 |
2007729 ns |
1844333 ns |
1.09 |
array/accumulate/Float32/dims=1L |
10013979.5 ns |
9855833 ns |
1.02 |
array/accumulate/Float32/dims=2L |
8145167 ns |
7549292 ns |
1.08 |
array/random/randn/Float32 |
810667 ns |
806041 ns |
1.01 |
array/random/randn!/Float32 |
609708 ns |
604084 ns |
1.01 |
array/random/rand!/Int64 |
548125 ns |
548625 ns |
1.00 |
array/random/rand!/Float32 |
582125 ns |
569834 ns |
1.02 |
array/random/rand/Int64 |
768041 ns |
813479.5 ns |
0.94 |
array/random/rand/Float32 |
668062.5 ns |
598917 ns |
1.12 |
array/reductions/reduce/Int64/1d |
1354396 ns |
1246458 ns |
1.09 |
array/reductions/reduce/Int64/dims=1 |
1113667 ns |
1066250 ns |
1.04 |
array/reductions/reduce/Int64/dims=2 |
1312083 ns |
1159375 ns |
1.13 |
array/reductions/reduce/Int64/dims=1L |
2023000 ns |
2051083 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
4377083 ns |
3424666.5 ns |
1.28 |
array/reductions/reduce/Float32/1d |
986750 ns |
854125 ns |
1.16 |
array/reductions/reduce/Float32/dims=1 |
837625 ns |
811584 ns |
1.03 |
array/reductions/reduce/Float32/dims=2 |
881750 ns |
743292 ns |
1.19 |
array/reductions/reduce/Float32/dims=1L |
1321750.5 ns |
1329354.5 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
1866500 ns |
1745667 ns |
1.07 |
array/reductions/mapreduce/Int64/1d |
1343500 ns |
1416563 ns |
0.95 |
array/reductions/mapreduce/Int64/dims=1 |
1074125 ns |
1069687.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
1299833 ns |
1172437.5 ns |
1.11 |
array/reductions/mapreduce/Int64/dims=1L |
2026250 ns |
1986541.5 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=2L |
4311458.5 ns |
3283958 ns |
1.31 |
array/reductions/mapreduce/Float32/1d |
1049770.5 ns |
992000 ns |
1.06 |
array/reductions/mapreduce/Float32/dims=1 |
823792 ns |
813708 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
879292 ns |
746375 ns |
1.18 |
array/reductions/mapreduce/Float32/dims=1L |
1331417 ns |
1326333 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
1853709 ns |
1752958 ns |
1.06 |
array/private/copyto!/gpu_to_gpu |
634167 ns |
618000 ns |
1.03 |
array/private/copyto!/cpu_to_gpu |
803604.5 ns |
784542 ns |
1.02 |
array/private/copyto!/gpu_to_cpu |
782833 ns |
785458 ns |
1.00 |
array/private/iteration/findall/int |
1718667 ns |
1561958 ns |
1.10 |
array/private/iteration/findall/bool |
1503666.5 ns |
1421958 ns |
1.06 |
array/private/iteration/findfirst/int |
2103042 ns |
1808166 ns |
1.16 |
array/private/iteration/findfirst/bool |
2080229.5 ns |
1675041 ns |
1.24 |
array/private/iteration/scalar |
5453000 ns |
4652479 ns |
1.17 |
array/private/iteration/logical |
2645958 ns |
2505708.5 ns |
1.06 |
array/private/iteration/findmin/1d |
2271250 ns |
1902125 ns |
1.19 |
array/private/iteration/findmin/2d |
1565500 ns |
1510458 ns |
1.04 |
array/private/copy |
600250 ns |
554729 ns |
1.08 |
array/shared/copyto!/gpu_to_gpu |
83875 ns |
83750 ns |
1.00 |
array/shared/copyto!/cpu_to_gpu |
82875 ns |
81542 ns |
1.02 |
array/shared/copyto!/gpu_to_cpu |
83000 ns |
82459 ns |
1.01 |
array/shared/iteration/findall/int |
1669250 ns |
1577417 ns |
1.06 |
array/shared/iteration/findall/bool |
1515666.5 ns |
1437209 ns |
1.05 |
array/shared/iteration/findfirst/int |
1708250 ns |
1321541.5 ns |
1.29 |
array/shared/iteration/findfirst/bool |
1680583.5 ns |
1308542 ns |
1.28 |
array/shared/iteration/scalar |
208000 ns |
199708 ns |
1.04 |
array/shared/iteration/logical |
2610854 ns |
2227625 ns |
1.17 |
array/shared/iteration/findmin/1d |
1887083 ns |
1410625 ns |
1.34 |
array/shared/iteration/findmin/2d |
1569666 ns |
1511604 ns |
1.04 |
array/shared/copy |
241541 ns |
250333 ns |
0.96 |
array/permutedims/4d |
2638959 ns |
2361500 ns |
1.12 |
array/permutedims/2d |
1135083.5 ns |
1143583 ns |
0.99 |
array/permutedims/3d |
1660334 ns |
1654771 ns |
1.00 |
metal/synchronization/stream |
19667 ns |
18667 ns |
1.05 |
metal/synchronization/context |
19458 ns |
20000 ns |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
8cde64c to
53a0c88
Compare
c1d78e5 to
543c8ee
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #648 +/- ##
==========================================
- Coverage 80.92% 80.83% -0.09%
==========================================
Files 62 62
Lines 2820 2844 +24
==========================================
+ Hits 2282 2299 +17
- Misses 538 545 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Let's mark this as draft until it pulls from a dev branch on GPUArrays. |
36bd61e to
7874058
Compare
6405cd5 to
5fd2378
Compare
7874058 to
68757f9
Compare
Now depends on #688