-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Vector API to decode BKD docIds #14203
base: main
Are you sure you want to change the base?
Conversation
On a AVX-512 Linux X86 machine:
|
Thanks for looking into it. Were you able to confirm that the difference with the variable count is indeed that auto-vectorization not getting enabled as opposed to something else such as different loop unrolling? I'm curious if you can compare the produced assembly and/or trick the JVM into generating more efficient code by writing the loop a bit differently, e.g. by having a fixed-size inner loop? |
Thanks for feedback! I implement the fixed-size inner loop and print out assembly for all. perf_asm.log
MAC M2
Linux X86 AVX512 profiling disabled
Linux X86 AVX512 profiling enabled
|
This is an interesting observation. I wonder if a small refactoring could help it get auto-vectorized? E.g. what if we applied the Sorry for pushing, but if we could get auto-vectorization to do the right thing, then this would automatically benefit all users, not only those who enable the vector module. |
Context: #14176
I find that when running with constant block size (512), JIT can auto-vectorize the decoding loop. But it does not work when block size become variable, which can be true in real BKD leaves. This PR proposes to use vector API to decode DocIds in BKD.
MAC M2
Linux X86 (AVX512 supported)