Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitonic sort implementation #1217

Merged
merged 1 commit into from
Nov 26, 2021

Conversation

xaellison
Copy link
Contributor

Implementation of sortperm! and sortperm using a bitonic sorter. The underlying bitonic sorter is general enough for any length array, and can do simple sorting as well as sortperm.

I believe it's faster for many scenarios than the existing quicksort, so maybe sort!{::AnyCuVector} should also use it. The complexity is worse though: for large enough arrays quicksort will outperform this, but that seems to be around lengths of ~16 million. Lengths like 2^n+1 perform similarly to 2^(n+1), hence the jagged performance curve. This data is from rtx 2070.

image

@codecov
Copy link

codecov bot commented Oct 23, 2021

Codecov Report

Merging #1217 (991f3ae) into master (89d2e82) will decrease coverage by 0.75%.
The diff coverage is 27.31%.

❗ Current head 991f3ae differs from pull request most recent head de44670. Consider uploading reports for the commit de44670 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1217      +/-   ##
==========================================
- Coverage   80.09%   79.34%   -0.76%     
==========================================
  Files         119      119              
  Lines        8407     8594     +187     
==========================================
+ Hits         6734     6819      +85     
- Misses       1673     1775     +102     
Impacted Files Coverage Δ
src/sorting.jl 23.91% <27.31%> (+1.64%) ⬆️
lib/cusparse/interfaces.jl 81.81% <0.00%> (-3.12%) ⬇️
src/accumulate.jl 97.05% <0.00%> (-2.95%) ⬇️
lib/cublas/wrappers.jl 92.56% <0.00%> (+0.22%) ⬆️
lib/cutensor/wrappers.jl 95.47% <0.00%> (+0.84%) ⬆️
src/linalg.jl 86.36% <0.00%> (+6.36%) ⬆️
lib/cudadrv/CUDAdrv.jl 80.00% <0.00%> (+32.00%) ⬆️
lib/cudnn/CUDNN.jl 75.80% <0.00%> (+37.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89d2e82...de44670. Read the comment docs.

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Good to have an implementation that doesn't rely on dynamic parallelism, which doesn't always perform well.

It'll take a while before I can review properly, but I added some quick thoughts already.

I believe it's faster for many scenarios than the existing quicksort, so maybe sort!{::AnyCuVector} should also use it. The complexity is worse though: for large enough arrays quicksort will outperform this, but that seems to be around lengths of ~16 million.

Maybe select the best algorithm at run time then? With possible customizability using an alg::Algorithm keyword like Base.

@xaellison
Copy link
Contributor Author

Thanks for preliminary feedback @maleadt
Two quick thoughts for you:

  1. Based on Allow sorting of tuples of numbers #1196 it looks like there is interest in sortperm(a; dims) - do you have an opinion about implementing more features than what exists in Base.sortperm (which doesn't do dims)?
  2. I like the idea of selecting an algorithm based on input size but it may be tricky to consistently make a good choice across all hardware (happy to try)

@maleadt
Copy link
Member

maleadt commented Oct 30, 2021

  1. More features is OK. I wonder if we should use a different method then, like how GPUArrays.mapreducedim! supports an init keyword while Base.mapreducedim! doesn't.
  2. As long as there's a way to select the algorithm the default selection doesn't need to be particularly robust. Or maybe it's easier to just default to bitonic and document the quicksort option.

@xaellison xaellison force-pushed the ae/bitonic_sortperm branch from 1c9f627 to 991f3ae Compare November 7, 2021 20:20
@maleadt maleadt force-pushed the ae/bitonic_sortperm branch from 991f3ae to de44670 Compare November 25, 2021 16:35
@maleadt maleadt changed the title Ae/bitonic sortperm Add bitonic sort implementation Nov 25, 2021
@maleadt maleadt added cuda array Stuff about CuArray. enhancement New feature or request labels Nov 25, 2021
@maleadt
Copy link
Member

maleadt commented Nov 25, 2021

Totally forgot about this, sorry! I've squashed and rebased the commits, and did some minor NFC changes (e.g. rename CuBitonicSort algorithm selector to the unexported CUDA.BitonicSort).

@maleadt maleadt merged commit b248f85 into JuliaGPU:master Nov 26, 2021
@xaellison
Copy link
Contributor Author

Cool thanks! I just started a new job so this was off my radar too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants