Skip to content

Latest commit

 

History

History
78 lines (57 loc) · 3.27 KB

README_EN.md

File metadata and controls

78 lines (57 loc) · 3.27 KB

Finetune on Long Sequence Dataset

한국어 | English

About Dataset

Dataset Task Length (median) Length (max)
TyDi QA Question Answering 6,165 67,135
Korquad 2.1 Question Answering 5,777 486,730
Fake News Sequence Classification 564 17,488
Modu Sentiment Sequence Classification 185 5,245
  • Length is calculated based on subword token.
  • TyDi QA is originally multilingual and contains BoolQA cases. We only use korean samples and skip BoolQA samples.

Setup

1. Requirements

pip3 install -r requirements.txt

2. Prepare Dataset

1) Question Answering

bash download_qa_dataset.sh

2) Sequence Classification

How to Run

  • We highly recommend to run the scripts on TPU instance in order to train and evaluate large and long-sequence datasets.
  • We trained and evaluated the models on the torch-xla-1.8.1 environment with TPU v3-8.
  • Disable --use_tpu argument for GPU training.
bash scripts/run_{$TASK_NAME}.sh  # kobigbird
bash scripts/run_{$TASK_NAME}_short.sh  # klue roberta
bash scripts/run_tydiqa.sh  # tydiqa
bash scripts/run_korquad_2.sh  # korquad 2.1
bash scripts/run_fake_news.sh  # fake news
bash scripts/run_modu_sentiment.sh  # modu sentiment

Results

  • In the case of sequence classification, it was evaluated by splitting train:test=8:2.
  • For korquad 2.1, we only use the subset of the train dataset because of limited computational resources.
    • Enable --all_korquad_2_sample argument in order to use full train dataset.
  • In the case of KoBigBird, question answering was trained with a length of 4096 and sequence classification was trained with a length of 1024.
  • KLUE RoBERTa was trained with a length of 512.
TyDi QA
(em/f1)
Korquad 2.1
(em/f1)
Fake News
(f1)
Modu Sentiment
(f1-macro)
KLUE-RoBERTa-Base 76.80 / 78.58 55.44 / 73.02 95.20 42.61
KoBigBird-BERT-Base 79.13 / 81.30 67.77 / 82.03 98.85 45.42

Reference