Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

XLM Evaluation results #9

Open
nooralahzadeh opened this issue Mar 7, 2020 · 6 comments
Open

XLM Evaluation results #9

nooralahzadeh opened this issue Mar 7, 2020 · 6 comments

Comments

@nooralahzadeh
Copy link

Hi,
I performed some experiments using the XLM implementation of huggingfaces training on sQuAD v1.1 training and test in the MLQA test set. The results are as follows:( f1 / EM)
en 68.51/56.13
es 57.59/41.21
ar 47.88/31.41
de 51.99/38.16
zh 38.34/21.39
hi 46.13/31.72
vi 44.09/27.07
I am wondering what it has a large difference with yours. Did you do special thing except the early stopping on MLQA-en?

@patrick-s-h-lewis
Copy link
Contributor

Hi Farad,

Which implementation of XLM are you using?

The HPs for XLM were:
Adam: lr=3e-5,
weight decay 0.005,
clip_norm=5,
epochs=3,
batch size=32,
triangular schedular: warmup_steps=500, total steps=10000

We used the pytext implementation of XLM. The correct tokenization and preprocessing is very important for good performance, I'm not sure whether the HF version has this correct, as a number of people have struggled to get the good results with XLM on HF.
We hope to opensource the code when the colleague who wrote it is back from leave.

@RachelKer
Copy link

Hello, any updates on the pytext code release ? I know the COVID situation may have changed the plans. I struggle to replicate of your paper on one shot learning (that is, trained on your MLQA-train chinese) with HF XLM-R (inference with a zero-shot model on chinese works fine). Thank you !

@patrick-s-h-lewis
Copy link
Contributor

Hi Rachel!

XLM-R wasn't included in our paper, so we can't directly help there.
I'll check internally on the reproducibility code for the MLQA paper

Patrick

@nooralahzadeh
Copy link
Author

Hi @RachelKer To achieve a similar performance on zh test sets, you just need to add
"final_text = tok_text" after line 497 in squad_metrics.py (only for zh). Because there isn't space and sub-word in Chinese, so we don't need to execute the get_final_test() function.

@RachelKer
Copy link

@nooralahzadeh Thank you, I saw your issue on HF repo a few days ago and with this change I manage to get the correct results for Bert and XLM trained on chinese, but not for XLM-R. Did you manage to train XLM-Roberta on chinese ?

@patrick-s-h-lewis Oh indeed I confused XLM-R and XLM on your paper, I am sorry. I think the training problem that I have occurs with XLM-R only. Thanks for checking the code release anyway, and your quick answer !

@patrick-s-h-lewis
Copy link
Contributor

Hey @RachelKer and @nooralahzadeh,

I asked internally about XLMR (since there is some overlap between the teams), the pytext model is released, but there aren't instructions for how to run it on MLQA, so someone is going to write these instructions up :)

Patrick

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants