Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent scores between loop and separate check #237

Open
106AbdulBasit opened this issue Jun 12, 2023 · 0 comments
Open

Inconsistent scores between loop and separate check #237

106AbdulBasit opened this issue Jun 12, 2023 · 0 comments

Comments

@106AbdulBasit
Copy link

Description:
I am encountering an issue with SacreBLEU where I am getting inconsistent scores between a loop implementation and a separate check for individual translations. Here are the details of the problem:

sacrebleu.sentence_bleu(sys, [refs])

Scenario: I am calculating BLEU scores for translations using both a loop and individual checks.
Expected Behavior: I anticipate consistent scores between the loop and the separate checks for the same translations.
Actual Behavior: The scores obtained from the loop implementation differ from the scores obtained from the separate check, even when using the same translation and reference pairs.
Example: Here is an example that demonstrates the discrepancy:
Translation: sys4 = "..." # Example translation
Reference: ref4 = ["..."] # Example reference
Expected Score (separate check): 100.0004
Actual Score (loop): 31.94
Steps to Reproduce:

Load the necessary data and libraries.
Implement the loop calculation using SacreBLEU, storing scores for each translation.
Perform a separate check for a specific translation and reference pair, using the same SacreBLEU calculation.
Compare the scores obtained from the loop and separate check.
Additional Information:

I have tried modifying the code, removing any potential sources of error, but the discrepancy persists.
I have verified that the data inputs are aligned correctly, and the sentence preprocessing is consistent.
I suspect there might be an issue related to how SacreBLEU is utilized in the loop implementation.
Any guidance or insight into this issue would be greatly appreciated. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant