-
Notifications
You must be signed in to change notification settings - Fork 567
Open
Description
Conflict of logic in accuracy_eval.py
2 steps are performed in the script (in order) for the text output of whisper.
-
Filter: only certain characters are accepted from the text output
e.g. only ‘a-z’, ‘ ’, ‘.’ is accepted -
Normalize: convert certain phrases into another form
e.g. “one hundred dollar” -> “100$”.
Issues: Normalize will violate the filtering logic, in the example above
“100$” should be filtered out, but added because normalization comes 2nd
Fix
-
Option 1: accept digits and dollar sign: PR filed [Draft] [Whisper] Add labels' in the whisper output #2252
-
Option 2: do normalization before filtering
Risk: might change ref accuracy.
Metadata
Metadata
Assignees
Labels
No labels