Skip to content

Conversation

@bonega
Copy link
Owner

@bonega bonega commented Jan 10, 2026

Summary

  • Make Entry::write branchless by always copying 3 bytes, then advancing by actual length
  • Add ASCII fast path in decode_slice_inner using const generic to skip table lookups for ASCII bytes
  • Unify decode_slice variants with const ASCII_OPT: bool generic parameter

Benchmark Results (vs master)

decode_checked/extended (primary target):

Size Improvement
64 19% faster
256 24% faster
512 25% faster
1024 26% faster
2048 28% faster
4096 29% faster

decode_checked/mostly_ascii:

Size Improvement
64 8% faster
256 17% faster
512 12% faster
1024 9% faster
2048 6% faster

Test plan

  • All existing tests pass
  • Fuzz tests pass
  • Benchmarks show improvement

- Make Entry::write() branchless by always copying 3 bytes
- Add ASCII fast path in decode_slice to skip table lookup for ASCII bytes
- Use const generic to unify decode_slice variants

This fixes a regression in decode_checked/extended (was 25-30% slower than
v0.2.0) while improving decode_checked/mostly_ascii by 10-24% over v0.2.0.

Benchmarks vs v0.2.0:
- decode_checked/extended: now equal (was 25-30% slower)
- decode_checked/mostly_ascii: 10-24% faster
- decode_checked/ascii: equal
- decode_lossy/*: 1.7-2.3x faster (unchanged)
- encode_*: 2x+ faster (unchanged)
@bonega bonega enabled auto-merge January 10, 2026 16:44
@bonega bonega merged commit fea3ca4 into master Jan 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants