Some unicode characters cause glitches #115

osa1 · 2018-10-27T06:55:40Z

I'll post a screenshot next time this happens, but basically some unicode characters cause glithces that even a tb_clear() doesn't fix.

What happens that the unicode character stays visible in its original location when you scroll (either with shift+arrow or as new messages appear), and even changing tabs don't make the character go away.

Only way to fix the glitch that I could find is to force a resize.

Probably termbox related.

The text was updated successfully, but these errors were encountered:

osa1 · 2019-03-07T09:44:23Z

I wonder if this could be a st bug. See the recent email to [email protected] with title "ST leaves parts of previous command behind". The problem sounds quite similar to this issue.

cleac · 2019-10-01T06:19:32Z

I have pretty same thing, both in konsole (the KDE's terminal application) and st.

How do I reproduce it: in chat.freenode.net tab enter LIST and press Enter. Then the handful of chats are printed, and some of them have those broken characters...

Here is screenshot:

osa1 · 2019-10-01T06:51:58Z

Thanks for the reproducer @cleac . I should prioritize this.

osa1 · 2019-10-01T11:05:26Z

This patch fixes the issue:

diff --git a/termbox/cbits/termbox.c b/termbox/cbits/termbox.c
index 37f00d0..705978c 100644
--- a/termbox/cbits/termbox.c
+++ b/termbox/cbits/termbox.c
@@ -160,10 +160,6 @@ void tb_present(void)
             front = &CELL(&front_buffer, x, y);
             w = wcwidth(back->ch);
             if (w < 1) w = 1;
-            if (memcmp(back, front, sizeof(struct tb_cell)) == 0) {
-                x += w;
-                continue;
-            }
             memcpy(front, back, sizeof(struct tb_cell));
             send_attr(back->fg, back->bg);
             if (w > 1 && x >= front_buffer.width - (w - 1)) {

(https://github.com/nsf/termbox/blob/master/src/termbox.c#L190-L193)

It's still not 100% clear to me that the bug is in that code though, perhaps the bug is something else and removing this code hides it.

osa1 · 2019-10-01T11:46:59Z

I believe there's a bug in passing Rust characters to termbox: termbox is expecting (although this is not mentioned anywhere in the code) system-specific wide chars (wchar_t) but I'm passing utf-8 encoded chars instead.

osa1 · 2019-10-01T12:26:36Z

Here's how we'll fix this: when I paste ｈ in tiny wcwidth(back->ch) in termbox should return 2. Currently it returns -1 probably because the encoding is wrong. When we fix the encoding and wcwidth() returns 2 for this character this bug should be fixed (or there are other bugs involved).

osa1 · 2019-10-01T12:32:56Z

Here's a example showing none of Rust's default encoding of chars as u32 or our utf-8 encoding is what wcwidth expects:

extern "C" {
    fn wcwidth(c: libc::wchar_t) -> libc::c_int;
}

const TAG_CONT: u8 = 0b1000_0000;
const TAG_TWO_B: u8 = 0b1100_0000;
const TAG_THREE_B: u8 = 0b1110_0000;
const TAG_FOUR_B: u8 = 0b1111_0000;
const MAX_ONE_B: u32 = 0x80;
const MAX_TWO_B: u32 = 0x800;
const MAX_THREE_B: u32 = 0x10000;

fn char_to_utf8(c: char) -> u32 {
    let code = c as u32;
    if code < MAX_ONE_B {
        code as u32
    } else if code < MAX_TWO_B {
        ((u32::from((code >> 6 & 0x1F) as u8 | TAG_TWO_B)) << 8)
            + u32::from((code & 0x3F) as u8 | TAG_CONT)
    } else if code < MAX_THREE_B {
        (u32::from((code >> 12 & 0x0F) as u8 | TAG_THREE_B) << 16)
            + (u32::from((code >> 6 & 0x3F) as u8 | TAG_CONT) << 8)
            + (u32::from((code & 0x3F) as u8 | TAG_CONT))
    } else {
        ((u32::from((code >> 18 & 0x07) as u8 | TAG_FOUR_B)) << 24)
            + ((u32::from((code >> 12 & 0x3F) as u8 | TAG_CONT)) << 16)
            + ((u32::from((code >> 6 & 0x3F) as u8 | TAG_CONT)) << 8)
            + (u32::from((code & 0x3F) as u8 | TAG_CONT))
    }
}

fn main() {
    let c = 'ｈ';
    println!("{}", unsafe { wcwidth(c as libc::wchar_t) });
    println!("{}", unsafe { wcwidth(char_to_utf8(c) as libc::wchar_t) });
}

Both lines print -1.

osa1 · 2019-10-01T14:22:47Z

I found this crate which implements the functionality we need for this: https://unicode-rs.github.io/unicode-width/unicode_width/

With this crate println!("{:?}", UnicodeWidthChar::width('ｈ')); prints 2.

I also added a comment about this in termbox upstream: nsf/termbox#126 (comment) but given how active the development is I don't expect to get any answers any time soon.
I also asked a question on SO: https://stackoverflow.com/questions/58185069/expected-encoding-of-wcwidth-argument

I think using unicode-width would be the best way forward, but it's a bit tricky because termbox is implemented in C and unicode-width is a Rust crate.

osa1 · 2019-10-02T09:06:40Z

Another idea is to get rid of termbox and use termion instead. termion is lacking termbox's "cell" abstraction but that's easy to implement. Only downside is we'll have to implement multi-column char support (as that's part of the "cell" abstraction termbox provides).

Input handling should still be done by another library (term_input for the time being) as termion's input support is nowhere near enough for tiny.

osa1 · 2019-10-04T15:24:50Z

Shortly after merging this I realized that this is not quite fixed. Here's example:

The byte that causes problems is 0x96. Things go bad after showing that invalid character.

osa1 · 2019-10-04T15:35:04Z

This should be fixed for real now. The previous patched fixed another bug where we treated wide characters as single-column, so no need to revert that.

osa1 added the bug label Oct 27, 2018

osa1 mentioned this issue Oct 3, 2019

Implement Termbox cell API using termion #143

Merged

osa1 closed this as completed in 1bc8a34 Oct 4, 2019

osa1 reopened this Oct 4, 2019

osa1 closed this as completed in e5adb22 Oct 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some unicode characters cause glitches #115

Some unicode characters cause glitches #115

osa1 commented Oct 27, 2018

osa1 commented Mar 7, 2019

Uh oh!

cleac commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019 •

edited

Loading

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 2, 2019

Uh oh!

osa1 commented Oct 4, 2019

Uh oh!

osa1 commented Oct 4, 2019

Uh oh!

Some unicode characters cause glitches #115

Some unicode characters cause glitches #115

Comments

osa1 commented Oct 27, 2018

osa1 commented Mar 7, 2019

Uh oh!

cleac commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 1, 2019

Uh oh!

osa1 commented Oct 2, 2019

Uh oh!

osa1 commented Oct 4, 2019

Uh oh!

osa1 commented Oct 4, 2019

Uh oh!

osa1 commented Oct 1, 2019 •

edited

Loading