-
Notifications
You must be signed in to change notification settings - Fork 243
Restore Enzyme to CI checks #2807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wsmoses
wants to merge
1
commit into
master
Choose a base branch
from
wsmoses-patch-1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+4
−2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 4a5ad9f | Previous: e561e7a | Ratio |
---|---|---|---|
latency/precompile |
42949872022 ns |
43393378645 ns |
0.99 |
latency/ttfp |
7007270652 ns |
7099882121 ns |
0.99 |
latency/import |
3556038541 ns |
3463869374 ns |
1.03 |
integration/volumerhs |
9623878 ns |
9623663 ns |
1.00 |
integration/byval/slices=1 |
147077 ns |
146714 ns |
1.00 |
integration/byval/slices=3 |
425971 ns |
425787 ns |
1.00 |
integration/byval/reference |
145029 ns |
144967 ns |
1.00 |
integration/byval/slices=2 |
286425 ns |
286209 ns |
1.00 |
integration/cudadevrt |
103641 ns |
103426 ns |
1.00 |
kernel/indexing |
14335 ns |
14196 ns |
1.01 |
kernel/indexing_checked |
15084 ns |
14906 ns |
1.01 |
kernel/occupancy |
679.9367088607595 ns |
759.2189781021898 ns |
0.90 |
kernel/launch |
2230.777777777778 ns |
2287.222222222222 ns |
0.98 |
kernel/rand |
17571 ns |
15792 ns |
1.11 |
array/reverse/1d |
19881 ns |
19624 ns |
1.01 |
array/reverse/2d |
24935 ns |
24928.5 ns |
1.00 |
array/reverse/1d_inplace |
10490 ns |
10448 ns |
1.00 |
array/reverse/2d_inplace |
12111 ns |
12006 ns |
1.01 |
array/copy |
20961 ns |
20990 ns |
1.00 |
array/iteration/findall/int |
157070.5 ns |
159128.5 ns |
0.99 |
array/iteration/findall/bool |
139341 ns |
139832 ns |
1.00 |
array/iteration/findfirst/int |
163593 ns |
162546 ns |
1.01 |
array/iteration/findfirst/bool |
165265.5 ns |
164393.5 ns |
1.01 |
array/iteration/scalar |
72886 ns |
72740 ns |
1.00 |
array/iteration/logical |
214856.5 ns |
216803.5 ns |
0.99 |
array/iteration/findmin/1d |
46833 ns |
45968 ns |
1.02 |
array/iteration/findmin/2d |
96796 ns |
96433 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43460.5 ns |
44555 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
46894.5 ns |
48607 ns |
0.96 |
array/reductions/reduce/Int64/dims=2 |
62535 ns |
63682.5 ns |
0.98 |
array/reductions/reduce/Int64/dims=1L |
89033 ns |
88842 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87389 ns |
89417.5 ns |
0.98 |
array/reductions/reduce/Float32/1d |
34666 ns |
34490 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
52166 ns |
50554 ns |
1.03 |
array/reductions/reduce/Float32/dims=2 |
59849 ns |
59726 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52488 ns |
52852 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
70560.5 ns |
70052.5 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
42556 ns |
45547 ns |
0.93 |
array/reductions/mapreduce/Int64/dims=1 |
48064 ns |
48423.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2 |
61700 ns |
61443 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
89055 ns |
88888 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
86922 ns |
87908.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34465 ns |
34245.5 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
45310 ns |
47287 ns |
0.96 |
array/reductions/mapreduce/Float32/dims=2 |
59818 ns |
59743 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52523 ns |
53154 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
70441 ns |
70503 ns |
1.00 |
array/broadcast |
20258 ns |
20866 ns |
0.97 |
array/copyto!/gpu_to_gpu |
11249 ns |
12817 ns |
0.88 |
array/copyto!/cpu_to_gpu |
215491 ns |
213873 ns |
1.01 |
array/copyto!/gpu_to_cpu |
283978 ns |
284406 ns |
1.00 |
array/accumulate/Int64/1d |
125006.5 ns |
125170 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83808 ns |
83519 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158398 ns |
158002 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1712912.5 ns |
1709945.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
967148 ns |
966571 ns |
1.00 |
array/accumulate/Float32/1d |
109589.5 ns |
109737 ns |
1.00 |
array/accumulate/Float32/dims=1 |
80543 ns |
80823.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
147782.5 ns |
147778 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1618526 ns |
1619194 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698403 ns |
698530 ns |
1.00 |
array/construct |
1272.3 ns |
1279.85 ns |
0.99 |
array/random/randn/Float32 |
43990 ns |
47253.5 ns |
0.93 |
array/random/randn!/Float32 |
25036 ns |
24573 ns |
1.02 |
array/random/rand!/Int64 |
27605.5 ns |
27294 ns |
1.01 |
array/random/rand!/Float32 |
8638.333333333334 ns |
8724.333333333334 ns |
0.99 |
array/random/rand/Int64 |
38206 ns |
29633 ns |
1.29 |
array/random/rand/Float32 |
13036 ns |
12902 ns |
1.01 |
array/permutedims/4d |
60332.5 ns |
61250.5 ns |
0.99 |
array/permutedims/2d |
54169.5 ns |
54865 ns |
0.99 |
array/permutedims/3d |
55013 ns |
55511 ns |
0.99 |
array/sorting/1d |
2758455 ns |
2757710 ns |
1.00 |
array/sorting/by |
3345066 ns |
3344132.5 ns |
1.00 |
array/sorting/2d |
1080402 ns |
1080389 ns |
1.00 |
cuda/synchronization/stream/auto |
1027.5 ns |
1015.8333333333334 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7395.4 ns |
7618.9 ns |
0.97 |
cuda/synchronization/stream/blocking |
811.233695652174 ns |
799.1530612244898 ns |
1.02 |
cuda/synchronization/context/auto |
1162.6 ns |
1164.1 ns |
1.00 |
cuda/synchronization/context/nonblocking |
7478.4 ns |
7651.4 ns |
0.98 |
cuda/synchronization/context/blocking |
894.7142857142857 ns |
895.8490566037735 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Enzyme CI fails. |
@vchuravy looks like your fix missed the tape_type function?
|
79bb632
to
4a5ad9f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
now that @vchuravy fixed the GPUCompiler compat