한국어 | English
Dataset | Task | Length (median) | Length (max) |
---|---|---|---|
TyDi QA | Question Answering | 6,165 | 67,135 |
Korquad 2.1 | Question Answering | 5,777 | 486,730 |
Fake News | Sequence Classification | 564 | 17,488 |
Modu Sentiment | Sequence Classification | 185 | 5,245 |
Length
is calculated based on subword token.- TyDi QA is originally
multilingual
and containsBoolQA
cases. We only use korean samples and skip BoolQA samples.
pip3 install -r requirements.txt
bash download_qa_dataset.sh
- After downloading the data through the link below, place the data in the
--data_dir
path. Fake news
: Korean Fake news (mission2_train.csv
)Modu sentiment corpus
: 감성 분석 말뭉치 2020 (EXSA2002108040.json
)
- We highly recommend to run the scripts on TPU instance in order to train and evaluate large and long-sequence datasets.
- We trained and evaluated the models on the torch-xla-1.8.1 environment with
TPU v3-8
. - Disable
--use_tpu
argument for GPU training.
bash scripts/run_{$TASK_NAME}.sh # kobigbird
bash scripts/run_{$TASK_NAME}_short.sh # klue roberta
bash scripts/run_tydiqa.sh # tydiqa
bash scripts/run_korquad_2.sh # korquad 2.1
bash scripts/run_fake_news.sh # fake news
bash scripts/run_modu_sentiment.sh # modu sentiment
- In the case of sequence classification, it was evaluated by splitting
train:test=8:2
. - For
korquad 2.1
, we only use the subset of the train dataset because of limited computational resources.- Enable
--all_korquad_2_sample
argument in order to use full train dataset.
- Enable
- In the case of
KoBigBird
, question answering was trained with a length of 4096 and sequence classification was trained with a length of 1024. KLUE RoBERTa
was trained with a length of 512.
TyDi QA (em/f1) |
Korquad 2.1 (em/f1) |
Fake News (f1) |
Modu Sentiment (f1-macro) |
|
---|---|---|---|---|
KLUE-RoBERTa-Base | 76.80 / 78.58 | 55.44 / 73.02 | 95.20 | 42.61 |
KoBigBird-BERT-Base | 79.13 / 81.30 | 67.77 / 82.03 | 98.85 | 45.42 |