Skip to content

Some unicode characters cause glitches #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
osa1 opened this issue Oct 27, 2018 · 11 comments
Closed

Some unicode characters cause glitches #115

osa1 opened this issue Oct 27, 2018 · 11 comments
Labels

Comments

@osa1
Copy link
Owner

osa1 commented Oct 27, 2018

I'll post a screenshot next time this happens, but basically some unicode characters cause glithces that even a tb_clear() doesn't fix.

What happens that the unicode character stays visible in its original location when you scroll (either with shift+arrow or as new messages appear), and even changing tabs don't make the character go away.

Only way to fix the glitch that I could find is to force a resize.

Probably termbox related.

@osa1 osa1 added the bug label Oct 27, 2018
@osa1
Copy link
Owner Author

osa1 commented Mar 7, 2019

I wonder if this could be a st bug. See the recent email to [email protected] with title "ST leaves parts of previous command behind". The problem sounds quite similar to this issue.

@cleac
Copy link

cleac commented Oct 1, 2019

I have pretty same thing, both in konsole (the KDE's terminal application) and st.

How do I reproduce it: in chat.freenode.net tab enter LIST and press Enter. Then the handful of chats are printed, and some of them have those broken characters...

Here is screenshot:
image

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

Thanks for the reproducer @cleac . I should prioritize this.

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

This patch fixes the issue:

diff --git a/termbox/cbits/termbox.c b/termbox/cbits/termbox.c
index 37f00d0..705978c 100644
--- a/termbox/cbits/termbox.c
+++ b/termbox/cbits/termbox.c
@@ -160,10 +160,6 @@ void tb_present(void)
             front = &CELL(&front_buffer, x, y);
             w = wcwidth(back->ch);
             if (w < 1) w = 1;
-            if (memcmp(back, front, sizeof(struct tb_cell)) == 0) {
-                x += w;
-                continue;
-            }
             memcpy(front, back, sizeof(struct tb_cell));
             send_attr(back->fg, back->bg);
             if (w > 1 && x >= front_buffer.width - (w - 1)) {

(https://github.com/nsf/termbox/blob/master/src/termbox.c#L190-L193)

It's still not 100% clear to me that the bug is in that code though, perhaps the bug is something else and removing this code hides it.

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

I believe there's a bug in passing Rust characters to termbox: termbox is expecting (although this is not mentioned anywhere in the code) system-specific wide chars (wchar_t) but I'm passing utf-8 encoded chars instead.

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

Here's how we'll fix this: when I paste in tiny wcwidth(back->ch) in termbox should return 2. Currently it returns -1 probably because the encoding is wrong. When we fix the encoding and wcwidth() returns 2 for this character this bug should be fixed (or there are other bugs involved).

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

Here's a example showing none of Rust's default encoding of chars as u32 or our utf-8 encoding is what wcwidth expects:

extern "C" {
    fn wcwidth(c: libc::wchar_t) -> libc::c_int;
}

const TAG_CONT: u8 = 0b1000_0000;
const TAG_TWO_B: u8 = 0b1100_0000;
const TAG_THREE_B: u8 = 0b1110_0000;
const TAG_FOUR_B: u8 = 0b1111_0000;
const MAX_ONE_B: u32 = 0x80;
const MAX_TWO_B: u32 = 0x800;
const MAX_THREE_B: u32 = 0x10000;

fn char_to_utf8(c: char) -> u32 {
    let code = c as u32;
    if code < MAX_ONE_B {
        code as u32
    } else if code < MAX_TWO_B {
        ((u32::from((code >> 6 & 0x1F) as u8 | TAG_TWO_B)) << 8)
            + u32::from((code & 0x3F) as u8 | TAG_CONT)
    } else if code < MAX_THREE_B {
        (u32::from((code >> 12 & 0x0F) as u8 | TAG_THREE_B) << 16)
            + (u32::from((code >> 6 & 0x3F) as u8 | TAG_CONT) << 8)
            + (u32::from((code & 0x3F) as u8 | TAG_CONT))
    } else {
        ((u32::from((code >> 18 & 0x07) as u8 | TAG_FOUR_B)) << 24)
            + ((u32::from((code >> 12 & 0x3F) as u8 | TAG_CONT)) << 16)
            + ((u32::from((code >> 6 & 0x3F) as u8 | TAG_CONT)) << 8)
            + (u32::from((code & 0x3F) as u8 | TAG_CONT))
    }
}

fn main() {
    let c = 'h';
    println!("{}", unsafe { wcwidth(c as libc::wchar_t) });
    println!("{}", unsafe { wcwidth(char_to_utf8(c) as libc::wchar_t) });
}

Both lines print -1.

@osa1
Copy link
Owner Author

osa1 commented Oct 1, 2019

I found this crate which implements the functionality we need for this: https://unicode-rs.github.io/unicode-width/unicode_width/

With this crate println!("{:?}", UnicodeWidthChar::width('h')); prints 2.

I also added a comment about this in termbox upstream: nsf/termbox#126 (comment) but given how active the development is I don't expect to get any answers any time soon.
I also asked a question on SO: https://stackoverflow.com/questions/58185069/expected-encoding-of-wcwidth-argument

I think using unicode-width would be the best way forward, but it's a bit tricky because termbox is implemented in C and unicode-width is a Rust crate.

@osa1
Copy link
Owner Author

osa1 commented Oct 2, 2019

Another idea is to get rid of termbox and use termion instead. termion is lacking termbox's "cell" abstraction but that's easy to implement. Only downside is we'll have to implement multi-column char support (as that's part of the "cell" abstraction termbox provides).

Input handling should still be done by another library (term_input for the time being) as termion's input support is nowhere near enough for tiny.

@osa1
Copy link
Owner Author

osa1 commented Oct 4, 2019

Shortly after merging this I realized that this is not quite fixed. Here's example:

Screenshot_2019-10-04_18-23-17

The byte that causes problems is 0x96. Things go bad after showing that invalid character.

@osa1 osa1 reopened this Oct 4, 2019
@osa1 osa1 closed this as completed in e5adb22 Oct 4, 2019
@osa1
Copy link
Owner Author

osa1 commented Oct 4, 2019

This should be fixed for real now. The previous patched fixed another bug where we treated wide characters as single-column, so no need to revert that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants