Skip to content

Conversation

@Canonelis
Copy link
Contributor

@Canonelis Canonelis commented Jul 13, 2021

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$
where "\a" represents all legal clue letters. I needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is "^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s+)inf)$"
where "\a" represents all legal clue letters.  Needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.
@Canonelis
Copy link
Contributor Author

So this allows all letters allowed by %a in lua scripting, but also allows any unicode characters above 0x0370 except for some whitespace characters and dashes. Very versatile and still allows for all the same clue formats as before.

@Canonelis
Copy link
Contributor Author

If you're busy I could provide a fairly exhaustive list of test cases. Anything I can do to help u add this to the project?

Fixed range of illegal characters
Make important character modifiers legal
Added compatibility with typing in foreign digit systems. It converts then to normal numbers before further processing.
Fixed bug where you can put a hyphen at the beginning of a clue if there is whitespace before it.
Added more whitespace characters.
@Canonelis
Copy link
Contributor Author

Did some rigorous testing on it, found one flaw. Generated 2000 clues that should work and they did. Generated 5000 clues that shouldn't work and they didn't. This is ready.

Lowercase works well with unicode characters
@Canonelis Canonelis force-pushed the adding-unicode-friendly-clue-parsing branch from a455e14 to 1acaed9 Compare July 28, 2021 05:58
@Canonelis
Copy link
Contributor Author

Canonelis commented Aug 6, 2021

This would be good to add pretty soon since you have so many foreign decks. Right now the characters it allows in clues is fairly arbitrary. If the character's code mod 256 is in the range of A-Z or a-z or À-ÿ then it accepts it, otherwise it rejects it.

I've played a few games with it now and I think it's done.

@Canonelis
Copy link
Contributor Author

Canonelis commented Aug 25, 2021

Here are 2
near legit clues(but not legit).txt
legit clues.txt
files you can copy and paste from.
They each were randomly generated and filtered by the regular expression
^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$
So with the allowed character sets, it gets pretty weird, but for testing purposes it worked great.
There are the numbers 0-9 in many other languages, so I included them as well which is why you might not see a normal number in each clue. For displaying and logging the clue, however, it puts it in as a normal digit.

Copy link
Owner

@Ryan6578 Ryan6578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to review getClueDetails, but these are what I found for now.


----------[ Character sets ]----------

digits_table = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these split up into 10 tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The digits_table[0] table lists all the characters that are the number 0 in other languages.
The digits_table[1] table lists all the characters that are the number 1 in other languages.
etc.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that?

Fixed range of illegal characters.
Do a small fix to prevent regex backtracking overflow on chat messages with large gaps of whitespace in them, such as "!A_______________B" where _ is a space.
Condensed the code for adding ranges of illegal characters.
This is was checked and produces the exact same table as before.
@Canonelis
Copy link
Contributor Author

Here are the submitted changes to the code.

A correct greedy way to remove leading and trailing whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants