Skip to content

Conversation

@malik672
Copy link

The previous Impl while it was nice was significantly slower due to multiple branches, since we are likely to hit all branches there's no effective way for the CPU to be able to predict pipeline effectively

on the other hand a lookup table is read from memory and placed in a single cacheline.

Results: increase in performance in decode

@nipunn1313
Copy link
Collaborator

Can you run the benchmarks here to show the difference? For any fine performance improvements, we have to use benchmarks to prove it's making a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants