Add fast decode_lossy path for incomplete codepages #16

bonega · 2026-01-10T12:08:59Z

Summary

Add dual-table approach for incomplete codepages: DECODE_TABLE (with Option<Entry>) for checked decoding, and DECODE_TABLE_LOSSY (pre-filled with replacement chars) for lossy decoding
decode_lossy now uses the pre-filled table with the fast branchless 4-byte write path from complete decoders

Benchmark Results

Benchmark	Speedup
`decode_lossy/all_bad/4096`	1.99x
`decode_lossy/all_bad/2048`	1.98x
`decode_lossy/all_bad/1024`	1.95x
`decode_lossy/mostly_ascii/4096`	1.60x
`decode_lossy/mostly_ascii/2048`	1.57x
`decode_lossy/mostly_ascii/1024`	1.54x

No regression to decode_checked performance.

Test plan

All existing tests pass
Benchmarks show expected improvements
Fuzz testing (154k validate runs, 618k invariant runs)

Adds benchmark for decode_lossy with mostly ASCII input (90% ASCII, 10% extended) to measure performance on typical mixed-content data.

Use dual-table approach for incomplete codepages: - DECODE_TABLE: Option<UTF8Entry> with UTF8Len enum (niche-optimized) - DECODE_TABLE_LOSSY: CompleteEntry with branchless 4-byte writes decode_lossy uses the pre-filled lossy table with the fast complete decoder, achieving 1.5-2x speedup with no regression to decode_checked.

- Move CompleteEntry to decoder/complete.rs as Entry - Move UTF8Entry/UTF8Len to decoder/incomplete.rs as Entry/Len - Re-export with descriptive names: CompleteEntry, IncompleteEntry, IncompleteLen - Update codegen to use new naming convention

bonega added 3 commits January 9, 2026 08:50

Add decode_lossy/mostly_ascii benchmark

c2b70ae

Adds benchmark for decode_lossy with mostly ASCII input (90% ASCII, 10% extended) to measure performance on typical mixed-content data.

bonega enabled auto-merge (squash) January 10, 2026 12:11

bonega merged commit 532a643 into master Jan 10, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast decode_lossy path for incomplete codepages #16

Add fast decode_lossy path for incomplete codepages #16

Uh oh!

bonega commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add fast decode_lossy path for incomplete codepages #16

Add fast decode_lossy path for incomplete codepages #16

Uh oh!

Conversation

bonega commented Jan 10, 2026

Summary

Benchmark Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants