-
Notifications
You must be signed in to change notification settings - Fork 1.3k
About ROUGE scores #89
Comments
Yes, that's expected. The "official" ROUGE script does a bunch of stemming, tokenization, and other things before calculating the score. The ROUGE metric in here doesn't do any of this, but it's a good enough proxy to use during training for getting a sense of what the score will be. As the amount of data increases and sentences become more similar it should be relatively close (at least in my experiments) So the recommended thing to do is to still run the official ROUGE script on the final model if you want to compare to published results. I don't want to use pyrouge, or some kind of other wrapper around the ROUGE script, because it's
I'd love to make the internal score behave more like the official one, but not sure if that's really worth the effort. |
Ok it make sense. I found some results where the difference was around 9 rouge points (on 11.5k sentences) which is not close at all. I maybe did a mistake somewhere, or as you said, I just can't use it anywhere else other than in training. I wanted to have it to score my predictions which is totally impossible with the current variance. Anyway thx for the clarification |
Hm, that's interesting. It would be great to look more into it. When I trained on Gigaword the scores were relatively close. |
Another reason perhaps your code has some bugs on calculating Rouge Scores. But it seems that has little influence on your example here. |
seq2seq/seq2seq/metrics/rouge.py Line 46 in 7f48589
Oh, why use set , if we have multiple same ngram, it will be wrong? I see the paper use COUNT instead of the unique word num?please tell me something, I am almost crazy |
Hi,
I used the
seq2seq/metrics/rouge.py
on my repo to add some features.I wanted to check my results, therefore I compared my script, with yours, and with pyrouge (a wrapper around official rouge script) & pythonrouge (not the lastest commit)(a perl wrapper too)
It turns out that
(
pltrdy.rouge
==seq2seq.metrics.rouge
) != (pythonrouge
==pyrouge
)I show below how to compare
seq2seq.metrics.rouge
withpyrouge
.Setup
seq2seq.metrics.rouge.py
, I just added:pyrouge
(see pyrouge on pypi) which wraps the official ROUGE-1.5.5 (perl script).eval_pyrouge.py
:Run
Values:
python rouge.py "$HYP" "$REF"
:python eval_pyrouge.py
:Any idea?
Thx
pltrdy
The text was updated successfully, but these errors were encountered: