Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Japanese text is misidentified as URL. #390

Closed
mattn opened this issue May 1, 2023 · 9 comments
Closed

[BUG] Japanese text is misidentified as URL. #390

mattn opened this issue May 1, 2023 · 9 comments

Comments

@mattn
Copy link

mattn commented May 1, 2023

Describe the bug
Some Japanese text is unexpectedly misidentified as URL.

To Reproduce
Steps to reproduce the behavior:

  1. Open input dialog
  2. Type some Japanese characters contains (\u3002)
  3. Post the note
  4. See the text is misidentified as URL.

Expected behavior
The text should be normal text.

Device (please complete the following information):

  • Android Version: 0.37.4

This is related on linkedin/URL-Detector. linkedin/URL-Detector#39

URL-Detector handle as dot. This is not a bug because IDN allow to use as dot. However, most of Japanese text are often misidentified.

image

@afternooncurry
Copy link

Ideographic full stop and full width period are not handled as dot in IDNA2008 while IDNA2003 does. A current recommendation is IDNA2008. linkedin/URL-Detector looks implemented based on IDNA2003 about IDN which may causes the issue.

@mattn
Copy link
Author

mattn commented Jun 15, 2023

@vitorpamplona Many users in the East Asian region have been waiting for this fix for a long time.

@vitorpamplona
Copy link
Owner

We are waiting for the library to be able to support these additional characters. Until that library is fixed, there is not much we can do :(

@mattn
Copy link
Author

mattn commented Jun 15, 2023

Thanks. FYI, This issue can be reproduced with English speakers.

image

@vitorpamplona
Copy link
Owner

vitorpamplona commented Jun 15, 2023

Interesting. That is a different "bug".

We have a separate procedure to linkify all yyy.xxx texts. It requires the domain separator . and should not affect the \u3002 character. Is that an issue for you? I see it here and there during the week but always just ignore it because it works most of the time.

@vitorpamplona
Copy link
Owner

vitorpamplona commented Jun 15, 2023

And now that urls can have any Unicode character, I am not really sure how to solve it. Because the Thanks.<emoji> is a valid URL and could actually exist these days.

@mattn
Copy link
Author

mattn commented Jun 16, 2023

I don't make sure what is wrong in the code but this is rendered as URL in only amethyst.

image

@mattn
Copy link
Author

mattn commented Jul 5, 2023

Is this related on this issue?

image

@mattn
Copy link
Author

mattn commented Jul 7, 2023

I confirmed this issue is fixed in #491

Thanks.

@mattn mattn closed this as completed Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants