-
Notifications
You must be signed in to change notification settings - Fork 86
code sample for lab day #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| import csv | ||
|
|
||
| #function converts word to lower case and provides special rules for irish and turkish since they are not covered by the book | ||
| def toLower(input_string : str, language: str) -> str: | ||
| irish_vowels = "AEIOUÁÉÍÓÚ" | ||
| tilda_ord = 771 | ||
| #the following condition for irish uses a conditional lamda function which looks for the following rules | ||
| # - include a hyphen if the first alphabet is T or N and its followed by an irish vowel. It also checks for | ||
| # ord 771 since that means that there is a tild on the second alphabet since that could be achieved by either a singular unicode or by 2 unicodes | ||
| # this is the reason we only check at postion[2] since if there is a tilda at [2] it means there can't be a vowel at [1] | ||
|
|
||
| if(language.find("ga") + 1): | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. != -1 would be more readable. Also, this isn't exactly correct, since there are 3-letter language codes in ISO 639-3, and so this would work for "gaa", for example (Ga language spoken in Ghana). |
||
| irish_lower = lambda input_string : (input_string[0] + "-" + input_string[1:]).lower() \ | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A bit odd to do this as a lambda. Why not just apply these lines of code directly to the object with the same name? |
||
| if (input_string[0] == "t" or input_string[0] == "n") and irish_vowels.find(input_string[1]) + 1 \ | ||
| and ord(input_string[2]) != tilda_ord \ | ||
| else input_string.lower() | ||
|
|
||
| return irish_lower(input_string) | ||
|
|
||
| # Turkish and Azerbaijani follow similar rules to standard covertion with just a different i | ||
| elif((language.find("tr") + 1) or (language.find("az") + 1)): | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue with 3-letter codes. |
||
| return input_string.lower().replace("i", "ı") | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe this behaves the way we want. Did you test with both dotted and non-dotted uppercase?
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You really need a lot more tests, since they might turn up additional bugs. |
||
|
|
||
| else: | ||
| return input_string.lower() | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem description asked for an optimization for languages without case. |
||
|
|
||
| # Basic Testing | ||
| with open("tests.tsv") as file: | ||
| tsv_file = csv.reader(file, delimiter="\t") | ||
| for line in tsv_file: | ||
| assert toLower(line[0], line[1]) == line[2], "Test Failed for the word " + line[0] | ||
| print("All the test cases where successful!!") | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something more general going on here that you'll want to handle in general, not just for this one combining diacritics. We'll discuss in class when we get into Unicode more deeply, or you can stop by office hours if you're curious.