You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
I am encountering an issue with SacreBLEU where I am getting inconsistent scores between a loop implementation and a separate check for individual translations. Here are the details of the problem:
sacrebleu.sentence_bleu(sys, [refs])
Scenario: I am calculating BLEU scores for translations using both a loop and individual checks.
Expected Behavior: I anticipate consistent scores between the loop and the separate checks for the same translations.
Actual Behavior: The scores obtained from the loop implementation differ from the scores obtained from the separate check, even when using the same translation and reference pairs.
Example: Here is an example that demonstrates the discrepancy:
Translation: sys4 = "..." # Example translation
Reference: ref4 = ["..."] # Example reference
Expected Score (separate check): 100.0004
Actual Score (loop): 31.94
Steps to Reproduce:
Load the necessary data and libraries.
Implement the loop calculation using SacreBLEU, storing scores for each translation.
Perform a separate check for a specific translation and reference pair, using the same SacreBLEU calculation.
Compare the scores obtained from the loop and separate check.
Additional Information:
I have tried modifying the code, removing any potential sources of error, but the discrepancy persists.
I have verified that the data inputs are aligned correctly, and the sentence preprocessing is consistent.
I suspect there might be an issue related to how SacreBLEU is utilized in the loop implementation.
Any guidance or insight into this issue would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
Description:
I am encountering an issue with SacreBLEU where I am getting inconsistent scores between a loop implementation and a separate check for individual translations. Here are the details of the problem:
sacrebleu.sentence_bleu(sys, [refs])
Scenario: I am calculating BLEU scores for translations using both a loop and individual checks.
Expected Behavior: I anticipate consistent scores between the loop and the separate checks for the same translations.
Actual Behavior: The scores obtained from the loop implementation differ from the scores obtained from the separate check, even when using the same translation and reference pairs.
Example: Here is an example that demonstrates the discrepancy:
Translation: sys4 = "..." # Example translation
Reference: ref4 = ["..."] # Example reference
Expected Score (separate check): 100.0004
Actual Score (loop): 31.94
Steps to Reproduce:
Load the necessary data and libraries.
Implement the loop calculation using SacreBLEU, storing scores for each translation.
Perform a separate check for a specific translation and reference pair, using the same SacreBLEU calculation.
Compare the scores obtained from the loop and separate check.
Additional Information:
I have tried modifying the code, removing any potential sources of error, but the discrepancy persists.
I have verified that the data inputs are aligned correctly, and the sentence preprocessing is consistent.
I suspect there might be an issue related to how SacreBLEU is utilized in the loop implementation.
Any guidance or insight into this issue would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: