Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions S23/akumar2709/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import csv

#function converts word to lower case and provides special rules for irish and turkish since they are not covered by the book
def toLower(input_string : str, language: str) -> str:
irish_vowels = "AEIOUÁÉÍÓÚ"
tilda_ord = 771
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something more general going on here that you'll want to handle in general, not just for this one combining diacritics. We'll discuss in class when we get into Unicode more deeply, or you can stop by office hours if you're curious.

#the following condition for irish uses a conditional lamda function which looks for the following rules
# - include a hyphen if the first alphabet is T or N and its followed by an irish vowel. It also checks for
# ord 771 since that means that there is a tild on the second alphabet since that could be achieved by either a singular unicode or by 2 unicodes
# this is the reason we only check at postion[2] since if there is a tilda at [2] it means there can't be a vowel at [1]

if(language.find("ga") + 1):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= -1 would be more readable. Also, this isn't exactly correct, since there are 3-letter language codes in ISO 639-3, and so this would work for "gaa", for example (Ga language spoken in Ghana).

irish_lower = lambda input_string : (input_string[0] + "-" + input_string[1:]).lower() \
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit odd to do this as a lambda. Why not just apply these lines of code directly to the object with the same name?

if (input_string[0] == "t" or input_string[0] == "n") and irish_vowels.find(input_string[1]) + 1 \
and ord(input_string[2]) != tilda_ord \
else input_string.lower()

return irish_lower(input_string)

# Turkish and Azerbaijani follow similar rules to standard covertion with just a different i
elif((language.find("tr") + 1) or (language.find("az") + 1)):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with 3-letter codes.

return input_string.lower().replace("i", "ı")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this behaves the way we want. Did you test with both dotted and non-dotted uppercase?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really need a lot more tests, since they might turn up additional bugs.


else:
return input_string.lower()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem description asked for an optimization for languages without case.


# Basic Testing
with open("tests.tsv") as file:
tsv_file = csv.reader(file, delimiter="\t")
for line in tsv_file:
assert toLower(line[0], line[1]) == line[2], "Test Failed for the word " + line[0]
print("All the test cases where successful!!")