Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Julia version of lap3dkernel #7

Merged
merged 4 commits into from
Sep 7, 2020

Conversation

maltezfaria
Copy link
Contributor

  • Implement a vectorized Julia version of the lap3dkernel function. It relies on the package SIMD.jl to perform "explicit" vectorization. On my machine, this brings the performance close to the one observed on the vectorized C++ code.
  • Make julia version work with single precision, and modify the script to print these benchmarks.

@ahbarnett ahbarnett merged commit ef568fb into ahbarnett:master Sep 7, 2020
@ahbarnett
Copy link
Owner

Dear Luiz,
Thanks for this. Looks like you are a SIMD expert in julia - very useful. Indeed this matches the VCL C++ lib w/ standard sqrt, in double. Single-prec is same as double except your manual SIMD, which twice as fast as the custom rsqrt tweak to VCL ... this is a surprise! I'm confused why the answer matches to 11 digits for a single-prec calc here, that presumable doesn't sum in exactly the same order. Any thoughts?

Here's the 2017 i7 laptop:

Result with type Float32: 
targ-vec: 100000000 src-targ pairs, ans: 92799.578125 
 	 time 1.54 s 0.065 Gpair/s
devec: 100000000 src-targ pairs, ans: 92799.578125 
 	 time 0.405 s 0.247 Gpair/s
devec par: 100000000 src-targ pairs, ans: 92799.578125 
 	 time 0.0975 s 1.03 Gpair/s
devec par new: 100000000 src-targ pairs, ans: 92799.578125 
 	 time 0.0112 s 8.9 Gpair/s
Result with type Float64: 
targ-vec: 100000000 src-targ pairs, ans: 63886.595569 
 	 time 1.78 s 0.0563 Gpair/s
devec: 100000000 src-targ pairs, ans: 63886.595569 
 	 time 0.3 s 0.333 Gpair/s
devec par: 100000000 src-targ pairs, ans: 63886.595569 
 	 time 0.0783 s 1.28 Gpair/s
devec par new: 100000000 src-targ pairs, ans: 63886.595569 
 	 time 0.0372 s 2.69 Gpair/s

I am adding your results to the main table.

I'm also trying an avx512 desktop but don't have julia on it yet..

Anyway, thanks for showing julia manual SIMD is getting to similar level to C++ and VCL manual SIMD!
Best, Alex

@maltezfaria
Copy link
Contributor Author

maltezfaria commented Sep 7, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants