Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add wmt18-21 biomedical dataset and "doc aligned" dataset support #205

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

BrightXiaoHan
Copy link
Contributor

related to #149

sacrebleu/utils.py Outdated Show resolved Hide resolved
@BrightXiaoHan BrightXiaoHan marked this pull request as draft August 17, 2022 09:45
@BrightXiaoHan BrightXiaoHan marked this pull request as ready for review September 15, 2022 15:53
Copy link
Owner

@mjpost mjpost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rebase on master? I hope to do a 2.3 release within the next week.

sacrebleu/dataset/tsv.py Outdated Show resolved Hide resolved
class WMTBiomedicalTSVDataset(TSVDataset):
"""
The format used by the WMT Biomedical datasets. Data is not aligned sent by sent.
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this? It's not aligned by sentences, so what is this class doing to address that? What are the various fields?

Copy link
Contributor Author

@BrightXiaoHan BrightXiaoHan Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I'm not sure whether it's a good idea.
For example, I have a "doc aligned" dataset like this:

doc81   1       The popularity of E-Cigarettes is increasing.
doc81   2       Besides addiction and pulmonary health damage, reports of burn injuries from e-cigarette explosions are also increasing.
doc81   3       Mostly, explosions of e-cigarettes are attributed to its lithium-ion battery.
doc81   4       Due to increasing cases and missing guidelines we want to present three cases of our hospital and publish recommendations for the management of burn injuries caused by e-cigarette explosions.
doc81   5       Three cases of e-cigarette explosions which occurred between 2016 and 2019, are presented.
doc81   6       All three e-cigarette explosions occurred in the trouser pockets.
doc81   7       Two patients were male one patient was female.
doc81   8       The age ranged from 24 to 64 years, the burned total body surface area (TBSA) from 3 % to 12.5 %. All three patients required skin grafting and the length of stay in hospital ranged from five to eleven days.
doc81   9       In the synopsis of recent literature, we recommend the following management of burns due to e-cigarette explosions.
doc81   10      The guidelines of the Advanced Trauma Life Support should be followed, signs of an inhalation trauma should be checked and litmus test should be performed prior to irrigation with aqueous solutions to prevent exothermic reactions with remaining metals.
doc81   11      If litmus test shows alkali pH wounds should be irrigated by mineral oil.
doc81   1       E-Zigaretten erfreuen sich immer größerer Beliebtheit.
doc81   2       Neben abhängigkeitsrelevanten und pulmonalen Gesundheitsschäden häufen sich Berichte über Verbrennungsfolgen durch explodierende oder brennende E-Zigaretten.
doc81   3       Zumeist entstehen diese Brände durch Fehler in der Lithium-Ionen-Batterie.
doc81   4       Aufgrund der steigenden Zahlen der E-Zigaretten-Nutzer und der zunehmenden Verbrennungen durch diese Geräte möchten wir 3 Fälle unserer Klinik vorstellen und die Behandlungsstrategien erläutern.
doc81   5       Die Fälle und retrospektiven Daten von 3 Patienten, die sich zwischen 2016 und 2019 mit Verbrennungen durch E-Zigaretten vorgestellt haben, werden dargestellt.
doc81   6       Alle 3 Patienten stellten sich in der Notaufnahme mit Verbrennungen vor, die aufgrund von in der Hosentasche explodierter E-Zigaretten aufgetreten sind.
doc81   7       Zwei Patienten waren männlich und eine Patientin weiblich.
doc81   8       Das Alter der Patienten betrug 24, 30 und 64 Jahre.
doc81   9       Die verbrannten Körperoberflächen lagen zwischen 3 % und 12,5 % und benötigten Spalthauttransplantationen zwischen 1,5 % und 3,5 % der Körperoberflächen.
doc81   10      Die Patienten konnten nach 5 bis 11 Tagen aus der stationären Krankenhausbehandlung entlassen werden.
doc81   11      In Zusammenschau der vorhandenen Literatur wird bei Verbrennungsverletzungen durch explodierte E-Zigaretten folgende Behandlungsstrategie empfohlen.
doc81   12      Schwere Explosionstraumata bedürfen eines Schockraummanagements und besonders bei Explosionen während des Gebrauches sollte an ein Inhalationstrauma gedacht werden.
doc81   13      Bei alkalischem pH-Wert der Verbrennung kann es aufgrund verbliebener metallischer Reste der E-Zigarette zu einer exothermen Reaktion in Verbindung mit Wasser kommen, sodass eine Wundspülung mit Mineralöl empfohlen wird.

First, we can get all the source sentence by:

sacrebleu -t wmt21/biomedical -l en-de --echo src
The popularity of E-Cigarettes is increasing.
Besides addiction and pulmonary health damage, reports of burn injuries from e-cigarette explosions are also increasing.
Mostly, explosions of e-cigarettes are attributed to its lithium-ion battery.
Due to increasing cases and missing guidelines we want to present three cases of our hospital and publish recommendations for the management of burn injuries caused by e-cigarette explosions.
Three cases of e-cigarette explosions which occurred between 2016 and 2019, are presented.
All three e-cigarette explosions occurred in the trouser pockets.
Two patients were male one patient was female.
The age ranged from 24 to 64 years, the burned total body surface area (TBSA) from 3 % to 12.5 %. All three patients required skin grafting and the length of stay in hospital ranged from five to eleven days.
In the synopsis of recent literature, we recommend the following management of burns due to e-cigarette explosions.
The guidelines of the Advanced Trauma Life Support should be followed, signs of an inhalation trauma should be checked and litmus test should be performed prior to irrigation with aqueous solutions to prevent exothermic reactions with remaining metals.
If litmus test shows alkali pH wounds should be irrigated by mineral oil.

After translated by NMT system:

Die Popularität von E-Zigaretten nimmt zu.
Neben Sucht und Lungenschäden häufen sich auch Berichte über Brandverletzungen durch Explosionen von E-Zigaretten.
Meistens werden Explosionen von E-Zigaretten ihrem Lithium-Ionen-Akku zugeschrieben.
Aufgrund steigender Fallzahlen und fehlender Leitlinien möchten wir drei Fälle unseres Krankenhauses vorstellen und Empfehlungen zum Management von Brandverletzungen durch E-Zigaretten-Explosionen veröffentlichen.
Drei Fälle von E-Zigaretten-Explosionen, die sich zwischen 2016 und 2019 ereigneten, werden vorgestellt.
Alle drei E-Zigaretten-Explosionen ereigneten sich in den Hosentaschen.
Zwei Patienten waren männlich, ein Patient war weiblich.
Das Alter reichte von 24 bis 64 Jahren, die verbrannte Gesamtkörperoberfläche (TBSA) von 3 % bis 12,5 %. Alle drei Patienten benötigten eine Hauttransplantation und die Krankenhausaufenthaltsdauer lag zwischen fünf und elf Tagen.
In der Zusammenfassung der neueren Literatur empfehlen wir das folgende Management von Verbrennungen aufgrund von E-Zigaretten-Explosionen.
Die Richtlinien des Advanced Trauma Life Support sollten befolgt werden, Anzeichen eines Inhalationstraumas sollten überprüft werden und ein Lackmustest sollte vor der Spülung mit wässrigen Lösungen durchgeführt werden, um exotherme Reaktionen mit verbleibenden Metallen zu verhindern.
Wenn der Lackmustest einen alkalischen pH-Wert zeigt, sollten Wunden mit Mineralöl gespült werden.

If we want to calculate it and the bleu value of the target text, we must align the original text with the target text by docid

For the mt result, we must merge the sentences from the same doc by field docid_src.

doc_align("en-de", hyp_sents, "src")

For the reference sentences, we must merge the sentences from the same doc by field docid_ref

doc_align("en-de", ref_sents, "ref")

In this example, all sentences are from the the same doc, so all sentences are merged into same line. Then we can caculate the bleu score between the hyp and ref.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; this makes sense. An alternative would be project the source translations onto the references, word-by-word. However, I think this is beyond the scope of sacrebleu.

Do you know: is this how the task organizers computed BLEU? At the document level?

I also think the signature should be changed to reflect this. How about a join:doc element? e.g.,

 "signature": "nrefs:1|case:mixed|join:doc|eff:no|tok:13a|smooth:exp|version:2.2.0",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about how the task organizers computed BLEU, but I will test whether the output results are same as the official result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually calculate the bleu values of some submitted translations from wmt21/biomedical, and compared them with the official results. I found that the bleu values calculated using the --lowercase option are the closest to the official results, and there are 0-2 between the results. The bleu value is different, so far I have not been able to find the reason for the difference between the two.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post a table with (reported numbers, your calculations)?

I suggest emailing the organizers. Let's get this right, which will help with reproducibility. Did they report a sacrebleu signature, or have details in their paper?

@mjpost
Copy link
Owner

mjpost commented Oct 11, 2022

Hi, I didn't realize you had just pushed this and may have canceled the run. If it fails, please push again so we can re-trigger it.

@mjpost mjpost linked an issue Oct 14, 2022 that may be closed by this pull request
@mjpost
Copy link
Owner

mjpost commented Oct 18, 2022

Hi @BrightXiaoHan—have you had any luck tracking down the discrepancies? I would like to do the 2.3.0 release soon. If this is not ready, we can release it afterward as 2.3.1.

@BrightXiaoHan BrightXiaoHan marked this pull request as draft October 18, 2022 07:32
@BrightXiaoHan
Copy link
Contributor Author

It's not ready to release, I convert it to draft. We can release it afterward as 2.3.1.

@BrightXiaoHan BrightXiaoHan marked this pull request as ready for review December 27, 2022 16:04
@BrightXiaoHan BrightXiaoHan marked this pull request as draft December 28, 2022 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wmt20-biomed data
2 participants