You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The docstring for sacrebleu.metrics.BLEU.corpus_score says this:
:param references: A sequence of reference documents with document being
defined as a sequence of reference strings. If `None`, cached references
will be used.
This suggests that for a corpus with N documents and K annotators, references should be a list of N lists of K strings. But in reality the function expects the transpose of that (K lists of N strings).
If you do feed N lists of K strings, the function computes BLEU for the first K documents (albeit with some mismatched reference strings) and silently throws away the rest.
To prevent such misuse, I think it would be good to raise an exception or warning if the lengths of the inner reference lists don't match the length of the hypothesis list.
The text was updated successfully, but these errors were encountered:
The docstring for sacrebleu.metrics.BLEU.corpus_score says this:
This suggests that for a corpus with N documents and K annotators,
references
should be a list of N lists of K strings. But in reality the function expects the transpose of that (K lists of N strings).If you do feed N lists of K strings, the function computes BLEU for the first K documents (albeit with some mismatched reference strings) and silently throws away the rest.
To prevent such misuse, I think it would be good to raise an exception or warning if the lengths of the inner reference lists don't match the length of the hypothesis list.
The text was updated successfully, but these errors were encountered: