-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip normalization options + add variations for BN-NL part of N-Best corpus #30
Conversation
greenw0lf
commented
Nov 20, 2023
- Added variations based on the top 20 confusion pairs noticed for the BN-NL subset
- apostrophe (') and dash (-) no longer removed when normalizing the reference and hypothesis files
- Added the option to skip normalization either for the reference or the hypothesis files (or both). For both interface and terminal options
- Added the -D flag to the sclite command in the pipeline in order to treat optional words differently (the words that are in between brackets in the reference file(s))
- Fixed a small typo in the interface (Hypothese -> Hypothesis)
Fixed a bunch of bugs and connected functionality to interface
…h-apostrophe Remove ' and - from the punctuations to be removed
…ions Add some variations + remove dash (-) again
…ions Add variation + add back dash (-) to punctuation exceptions
Fix variation bug
Add support for skipping normalization in pipeline
…-flag Add sclite -D flag for optional words
…interf Testing skip normalization in interface
…interf Test the interface when submitting the form
…interf More UI testing
…interf Hopefully last UI changes
…interf More UI testing
…interf Skip norm interf
…interf Final interface touches (hopefully)
…interf Fix issue with getting values from form submit
…interf Hopefully works this time (changed value to name)
…variations Add variations from top 20 confusion pairs
…variations One final variation
Test removing -m hyp from sclite command in pipeline
add a flag that gives a more detailed breakdown
Add another variation for Moszkowicz
Changes include: - Changing nargs for hypfile and reffile args - Small rewording of comments and help messages - Removing skip_normalization as it was redundant - Changing the way interactive behaves (have it be a True value in the code when used) - Change normalization of numbers slightly (add a space after duizend, this is how it's done for Dutch) - Reorder sclite and variation related LOCs - Update README with new arguments added
I'm done with all the changes now. I decided not to add an argument for extra sclite arguments as there would be only a handful that would really work with the already-existing flags. We can add it if there will be other users requesting it. |
And please do test the interface/CLI, I will do some testing myself but I might miss some aspects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge, please consider to create issues based on my comments and include the improvements in a next PR.