adding unicode friendly clue parsing #92

Canonelis · 2021-07-13T11:48:55Z

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$
where "\a" represents all legal clue letters. I needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is "^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s+)inf)$" where "\a" represents all legal clue letters. Needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.

Canonelis · 2021-07-13T11:52:56Z

So this allows all letters allowed by %a in lua scripting, but also allows any unicode characters above 0x0370 except for some whitespace characters and dashes. Very versatile and still allows for all the same clue formats as before.

Canonelis · 2021-07-19T07:13:40Z

If you're busy I could provide a fairly exhaustive list of test cases. Anything I can do to help u add this to the project?

Fixed range of illegal characters

Make important character modifiers legal

Added compatibility with typing in foreign digit systems. It converts then to normal numbers before further processing. Fixed bug where you can put a hyphen at the beginning of a clue if there is whitespace before it. Added more whitespace characters.

Canonelis · 2021-07-27T01:39:35Z

Did some rigorous testing on it, found one flaw. Generated 2000 clues that should work and they did. Generated 5000 clues that shouldn't work and they didn't. This is ready.

Lowercase works well with unicode characters

Canonelis · 2021-08-06T06:04:11Z

This would be good to add pretty soon since you have so many foreign decks. Right now the characters it allows in clues is fairly arbitrary. If the character's code mod 256 is in the range of A-Z or a-z or À-ÿ then it accepts it, otherwise it rejects it.

I've played a few games with it now and I think it's done.

Canonelis · 2021-08-25T21:35:32Z

Here are 2
near legit clues(but not legit).txt
legit clues.txt
files you can copy and paste from.
They each were randomly generated and filtered by the regular expression
^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$
So with the allowed character sets, it gets pretty weird, but for testing purposes it worked great.
There are the numbers 0-9 in many other languages, so I included them as well which is why you might not see a normal number in each clue. For displaying and logging the clue, however, it puts it in as a normal digit.

Ryan6578

Need to review getClueDetails, but these are what I found for now.

src/Global.-1.ttslua

.gitignore

src/Global.-1.ttslua

Ryan6578 · 2021-08-25T21:52:17Z

src/Global.-1.ttslua


+----------[ Character sets ]----------
+
+digits_table = {


Why are these split up into 10 tables?

The digits_table[0] table lists all the characters that are the number 0 in other languages.
The digits_table[1] table lists all the characters that are the number 1 in other languages.
etc.

What do you mean by that?

src/Global.-1.ttslua

This reverts commit a514dce.

Fixed range of illegal characters. Do a small fix to prevent regex backtracking overflow on chat messages with large gaps of whitespace in them, such as "!A_______________B" where _ is a space.

Condensed the code for adding ranges of illegal characters. This is was checked and produces the exact same table as before.

Canonelis · 2021-08-26T10:23:04Z

Here are the submitted changes to the code.

A correct greedy way to remove leading and trailing whitespace

Canonelis added 3 commits July 25, 2021 23:31

Fixed range of illegal characters

a514dce

Fixed range of illegal characters

Make important character modifiers legal

0ba0e27

Make important character modifiers legal

Reduce all clues to lowercase

1acaed9

Lowercase works well with unicode characters

Canonelis force-pushed the adding-unicode-friendly-clue-parsing branch from a455e14 to 1acaed9 Compare July 28, 2021 05:58

Ryan6578 requested changes Aug 26, 2021

View reviewed changes

This was referenced Aug 26, 2021

changing regex from lazy to greedy #88

Closed

Fixed to support not only alphabets but also any unicode characters #83

Closed

Canonelis added 3 commits August 26, 2021 05:10

Revert "Fixed range of illegal characters"

5d806f4

This reverts commit a514dce.

ReCommit of Fixed range of illegal characters

51a708f

Fixed range of illegal characters. Do a small fix to prevent regex backtracking overflow on chat messages with large gaps of whitespace in them, such as "!A_______________B" where _ is a space.

Condensed the code for adding ranges of illegal characters.

c5d8ec2

Condensed the code for adding ranges of illegal characters. This is was checked and produces the exact same table as before.

A correct greedy way to remove leading and trailing whitespace

2f6c5cd

A correct greedy way to remove leading and trailing whitespace

adding unicode friendly clue parsing #92

Are you sure you want to change the base?

adding unicode friendly clue parsing #92

Uh oh!

Conversation

Canonelis commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Canonelis commented Jul 13, 2021

Uh oh!

Canonelis commented Jul 19, 2021

Uh oh!

Canonelis commented Jul 27, 2021

Uh oh!

Canonelis commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Canonelis commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ryan6578 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ryan6578 Aug 25, 2021

Choose a reason for hiding this comment

Uh oh!

Canonelis Aug 26, 2021

Choose a reason for hiding this comment

Uh oh!

Ryan6578 Sep 24, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Canonelis commented Aug 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Canonelis commented Jul 13, 2021 •

edited

Loading

Canonelis commented Aug 6, 2021 •

edited

Loading

Canonelis commented Aug 25, 2021 •

edited

Loading