Open
Description
❯ coreutils printf 'ᚱ \xE1' | ./target/release/tr -d 'ᚱ \341' | bat --plain --show-all
\x9A\xB1
Expected output is an empty string. The first byte of ᚱ is 0xE1 (225, or 341 in octal). tr
is being asked to delete "ᚱ", but also, separately the byte 225 ("\341"). There may be more bugs of this kind, where a UTF-8 character operand's leading byte is also present separately as an octal operand.
These bugs are unlikely to be practically significant, and can pretty easily be worked around. For instance:
❯ coreutils printf 'ᚱ \xE1' | ./target/release/tr -d 'ᚱ' | ./target/release/tr -d ' \341' | bat --plain --show-all
# No output
Most implementations of tr
can handle only binary data or only UTF-8 data at all, whereas this is a minor limitation in simultaneous binary and UTF-8 processing.
As such, this could probably also be marked as an enhancement instead of a bug.