Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized shuffle for typesize=12 #649

Open
froody opened this issue Feb 4, 2025 · 1 comment
Open

Optimized shuffle for typesize=12 #649

froody opened this issue Feb 4, 2025 · 1 comment

Comments

@froody
Copy link

froody commented Feb 4, 2025

Describe the bug
Feature request, I'm happy to contribute some but I don't know if my solutions will be optimal. I compress a lot of data where typesize=12, and when using shuffle this falls back to unshuffle_generic, which is slow. It would be nice if there were 12-byte variants of all the platform-specific shuffle code. It might not be as fast as a power-of-2 typesize, but it's still much faster than generic.

To Reproduce
Decompress any data using shuffle with typesize=12, see that unshuffle_generic dominates the overall time.

Expected behavior
unshuffle for typesize=12 is approximately as fast as typesize=8 or typesize=16

Logs
If applicable, add logs to help explain your problem.

System information:

  • OS: [e.g. OSX]
  • Compiler [e.g. gcc, clang]
  • Version [e.g. 2.0.1]

Additional context
I think it would be nice to support all possible typesizes up to a point, as for most the could be quite a significant speedup compared to the generic implementation.

Here's my attempt at avx512-unshuffle: #648

@FrancescAlted
Copy link
Member

I concur that this is a nice goal to do. Thanks for you AVX2-unshuffle for 12-bytes. Other contributions for different type sizes are welcome indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants