Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip normalization options + add variations for BN-NL part of N-Best corpus #30

Merged
merged 58 commits into from
Apr 5, 2024

Conversation

greenw0lf
Copy link
Collaborator

  • Added variations based on the top 20 confusion pairs noticed for the BN-NL subset
  • apostrophe (') and dash (-) no longer removed when normalizing the reference and hypothesis files
  • Added the option to skip normalization either for the reference or the hypothesis files (or both). For both interface and terminal options
  • Added the -D flag to the sclite command in the pipeline in order to treat optional words differently (the words that are in between brackets in the reference file(s))
  • Fixed a small typo in the interface (Hypothese -> Hypothesis)

greenw0lf and others added 30 commits October 5, 2023 16:30
Fixed a bunch of bugs and connected functionality to interface
…h-apostrophe

Remove ' and - from the punctuations to be removed
…ions

Add some variations + remove dash (-) again
…ions

Add variation + add back dash (-) to punctuation exceptions
Add support for skipping normalization in pipeline
…-flag

Add sclite -D flag for optional words
…interf

Testing skip normalization in interface
…interf

Test the interface when submitting the form
@greenw0lf greenw0lf requested a review from KleinRana November 20, 2023 13:45
Changes include:
- Changing nargs for hypfile and reffile args
- Small rewording of comments and help messages
- Removing skip_normalization as it was redundant
- Changing the way interactive behaves (have it be a True value in the code when used)
- Change normalization of numbers slightly (add a space after duizend, this is how it's done for Dutch)
- Reorder sclite and variation related LOCs
- Update README with new arguments added
@greenw0lf
Copy link
Collaborator Author

I'm done with all the changes now. I decided not to add an argument for extra sclite arguments as there would be only a handful that would really work with the already-existing flags. We can add it if there will be other users requesting it.

@greenw0lf
Copy link
Collaborator Author

And please do test the interface/CLI, I will do some testing myself but I might miss some aspects

Copy link
Collaborator

@KleinRana KleinRana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge, please consider to create issues based on my comments and include the improvements in a next PR.

@greenw0lf greenw0lf merged commit 47dc7d1 into main Apr 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants