Skip to content

boostcampaitech4nlp2/level1_semantictextsimilarity_nlp-level1-nlp-05

Repository files navigation

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

๐Ÿ’ก Competitions : [NLP] ๋ฌธ์žฅ ๊ฐ„ ์œ ์‚ฌ๋„ ์ธก์ • ๋ถ€์ŠคํŠธ์บ ํ”„์—์„œ ๊ฐœ์ตœํ•œ NLP ์˜๋ฏธ ์œ ์‚ฌ๋„ ํŒ๋ณ„(Semantic Text Similarity, STS) TASK ๋Œ€ํšŒ์—์„œ ์ง„ํ–‰๋˜์—ˆ๋˜ ํ”„๋กœ์ ํŠธ๋กœ, STS๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๋ฌธ์žฅ์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” AI๋ชจ๋ธ์„ ์„ค๊ณ„ํ•จ.

TimeLine

  • ๋Œ€ํšŒ๊ธฐ๊ฐ„: 2022.10.26 ~ 2022.11.03

Untitled (1)

ํ˜‘์—… ๋ฐฉ์‹

Notion

Team Notion์— ๊ฐ ํŒ€์˜ ํ˜„ํ™ฉ๊ณผ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ก ๋ฐ ๊ณต์œ .

Untitled (33)

Git

Untitled (31)

Untitled (32)

master branch์—์„œ baseline ์ˆ˜์ • ํ›„, ํŒ€์›์˜ ์ด๋ฆ„์œผ๋กœ ๋ถ„๊ธฐ๋ฅผ ๋‚˜๋ˆ„์–ด ์ž‘์—… ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. Wandb, config ํŒŒ์ผ ์—ฐ๊ฒฐ ๋“ฑ ๋ถ„์—…ํ•˜์—ฌ ์ž‘์—… ํ›„, ๊ฐ์ž branch๋กœ mergeํ•˜์˜€์Šต๋‹ˆ๋‹ค.

2. ํ”„๋กœ์ ํŠธ ํŒ€ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

๐Ÿ”ฌEDA : ๋‹จ์ต

Exploratory Data Analysis, Reference searching

๐Ÿ—‚๏ธData : ์žฌ๋•, ์„ํฌ

Data Augmentation, searching the pre-trained models

๐ŸงฌMODEL : ๊ฑด์šฐ, ์šฉ์ฐฌ

to reconstruct the baseline, searching the pre-trained models

3. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ์ ˆ์ฐจ ๋ฐ ๋ฐฉ๋ฒ•

Untitled (41)

1) ํƒ์ƒ‰์  ๋ถ„์„ ๋ฐ ์ „์ฒ˜๋ฆฌ(EDA) - ํ•™์Šต ๋ฐ์ดํ„ฐ ์†Œ๊ฐœ

Untitled (40)

train.csv

  • ๋‘ ๋ฌธ์žฅ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ํ”„๋กœ์ ํŠธ์˜ ์ตœ์ข… ๋ชฉํ‘œ์ด๊ณ , ๋ฐ์ดํ„ฐ์…‹์€ train(9,324 rows)/dev(550 rows)/test(1,100 rows) ๋น„์œจ๋กœ ๋‚˜๋ˆ„์–ด, csvํ˜•ํƒœ๋กœ ์ œ๊ณต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฐ ๋ฌธ์žฅ์˜ ์ถœ์ฒ˜๋Š” ๊ตญ๋ฏผ์ฒญ์› ๊ฒŒ์‹œํŒ ์ œ๋ชฉ, ๋„ค์ด๋ฒ„ ์˜ํ™” ๊ฐ์„ฑ ๋ถ„์„ ์ฝ”ํผ์Šค, ์—…์Šคํ…Œ์ด์ง€ ์Šฌ๋ž™ ๋ฐ์ดํ„ฐ์ด๋ฉฐ. ๊ฐ ๋ฐ์ดํ„ฐ๋ณ„ ์œ ์‚ฌ๋„(Label) ์ ์ˆ˜๋Š” ์—ฌ๋Ÿฌ๋ช…์˜ ์‚ฌ๋žŒ์ด ๊ณตํ†ต์˜ ์ ์ˆ˜ ๊ธฐ์ค€์œผ๋กœ ๋‘ ๋ฌธ์žฅ๊ฐ„์˜ ์ ์ˆ˜๋ฅผ ํ‰๊ท ๋‚ธ ๊ฐ’์ž…๋‹ˆ๋‹ค.

Untitled (35)

train.csv : 9,324 rows

Untitled (36)

dev.csv : 550 rows

  • train ๋ฐ์ดํ„ฐ์…‹์˜ Label๋ณ„ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด, Label 0์œผ๋กœ ์ ๋ฆฐ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, dev.csv ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋“  label์˜ ๋ถ„ํฌ๊ฐ€ ๋Œ€์ฒด๋กœ ๊ท ์ผํ•œ ํŽธ์œผ๋กœ ๊ด€์ธก๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • train์˜ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์„ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด, ****label 0์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ค„์—ฌ์„œ ๋‹ค๋ฅธ label๊ณผ ๋ถ„ํฌ๋ฅผ ๋งž์ถ”๊ฑฐ๋‚˜, label 5๋ฅผ ๋Š˜๋ ค์„œ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ์ดํ„ฐ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ–ˆ์Šต๋‹ˆ๋‹ค.(Data Augmentation)

2) Modeling

Baseline Code ์ˆ˜์ •

  • Wandb, Wandb Sweep ๊ตฌํ˜„
  • yaml+OmegaConf+Shell ํ™œ์šฉํ•œ ๋ชจ๋ธํ•™์Šต ๋ฐ ์‹คํ—˜๊ด€๋ฆฌ ํŽธ์˜์„ฑ ์ฆ๋Œ€

๊ฐ€์žฅ ์ข‹์€ Pre-trained Model ์„ ํƒ

  • ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜ RoBERTa, ELECTRA Pre-trained Model ๋“ค์„ ๋น„๊ตํ•ด๋ณด์•˜๊ณ , snunlp/KR-ELECTRA-discriminator๊ฐ€ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • ์ดํ›„ snunlp/KR-ELECTRA-discriminator ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ Data Augmentation ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

3) Data pre-processing

Data Augmentation

Baseline Model(klue/roberta-base, Loss: L1, Optimizer: AdamW) ๊ธฐ์ค€์œผ๋กœ ์•„๋ž˜ 4๊ฐ€์ง€ ์ฆ๊ฐ•๊ธฐ๋ฒ• ๋ฐ ์Šค๋ฌด๋”ฉ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ Data pre-processing์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์›๋ณธ ๋ฐ์ดํ„ฐ์™€ ์ฆ๊ฐ•๋œ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ์„ ์กฐ์ • ํ•˜๋ฉด์„œ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๊ณ , ์—ฌ๋Ÿฌ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์„ ์ค‘๋ณต ์ ์šฉํ•˜๋Š” ๋“ฑ ์ตœ์ ์˜ ์กฐํ•ฉ์„ ์ฐพ์•„๋‚ด๊ณ ์ž ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ์™ธ, ํ•™์Šต์˜ ์†๋„๋ฅผ ์œ„ํ•ด learning rate๋ฅผ ๋Š˜๋ ค์ฃผ๊ฑฐ๋‚˜, generalํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด batch size๋“ฑ์„ ๋Š˜๋ ค์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

  • Back Translationยนโพ
    • ํ•œ๊ตญ์–ด์—์„œ ์˜์–ด ๋ฒˆ์—ญ ํ›„, ์˜์–ด์—์„œ ํ•œ๊ตญ์–ด ์—ญ๋ฒˆ์—ญ
    • ์—ญ ๋ฒˆ์—ญ ์‹œ ๋ถ€์ ์ ˆํ•œ ๋ฒˆ์—ญ ๊ฒฐ๊ณผ์™€ ๋ฐœ์ƒํ•˜์—ฌ ์ผ๊ด€๋œ ์ ์ˆ˜ ๊ธฐ์ค€์ด ์ค‘์š”ํ•œ STS Task์— ์ ์ ˆํ•˜์ง€ ๋ชปํ•œ ๊ธฐ๋ฒ•์ด๋ผ ํŒ๋‹จํ•˜์—ฌ ์ œ์™ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Copied Translationยฒโพ
    • sentence1์„ sentence 2๋กœ ๋ณต์‚ฌํ•˜์—ฌ, label 5 ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
    • Train Dataset ๋ถ„ํฌ ๋ถ„์„ ๊ฒฐ๊ณผ 5 Label ๋ฐ์ดํ„ฐ๊ฐ€ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ 1%์ด๊ธฐ์— Sentence๊ฐ€ ์„œ๋กœ ๊ฐ™์€ ๋ฌธ์žฅ์„ ์› ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ 5 Label ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Swap Sentence
    • sentence1๊ณผ sentence 2์˜ ์ˆœ์„œ๋ฅผ ๋ฐ”๊ฟ”์คŒ
    • Sentence 1๊ณผ Sentece 2์˜ Segment Embedding ๊ฐ’์ด ๋‹ค๋ฅด๊ธฐ์— ๋ณ€๊ฒฝ ์‹œ ์œ ์˜๋ฏธํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ๋ถ„์„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์‹œ๋„ํ•œ ๋ฐฉ๋ฒ• ์ค‘ ๊ฐ€์žฅ ํšจ๊ณผ๊ฐ€ ์ข‹์•˜์Šต๋‹ˆ๋‹ค.
  • Reverse Textยณโพ
    • ๋ฌธ์ž๋ฅผ ์—ญ์ˆœ์œผ๋กœ ์ƒ์„ฑ
    • ๋‹จ๋… ์‚ฌ์šฉ์‹œ ํšจ๊ณผ๊ฐ€ ์žˆ์—ˆ๊ณ  ์ด๋ฅผ ํ†ตํ•ด ์œ ์˜๋ฏธํ•œ ๋…ธ์ด์ฆˆ ๊ฐ’์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๋ถ„์„ํ–ˆ์ง€๋งŒ ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ์‹œ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜์—ฌ ์ œ์™ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Label Smoothing
    • label 0 ๋ฐ์ดํ„ฐ ์ œ๊ฑฐ
    • Train Dataset์˜ 50% ์ด์ƒ์ด 0 Label ์ด๊ธฐ์— ํ•ด๋‹น Label์„ 50% ์–ธ๋” ์ƒ˜ํ”Œ๋ง ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • ์ด๋ฅผ Copied Translation๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ• ์‹œ ํšจ๊ณผ๊ฐ€ ์ข‹์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์› ๋ถ„ํฌ์ธ Positive Skewness ๋ถ„ํฌ์—์„œ ๋น„๊ต์  Uniformํ•œ ๋ถ„ํฌ๋กœ ๋ณ€๊ฒฝ๋œ ๊ฒฐ๊ณผ๋ผ๊ณ  ๋ถ„์„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ, learning rate 1e-5, batch size 16์—์„œ๋Š” Copied Translation, Reverse Text, learning rate 2e-5, batch size 32์—์„œ๋Š” Swap Sentence, Label Smoothing, Copied Translation์ด ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

4) Optimization

Hyperparameter ์‹คํ—˜ ๋ฐ ๋น„๊ต

  • Loss, Batch Size, Learning rate, Data์— ๋”ฐ๋ฅธ ์‹คํ—˜ ๋ฐ ๋น„๊ต

    Loss MSE L1
    Batch Size 16 32
    Learning rate 1e-5 3e-5 5e-5
    Data Label Smoothing 0,
    Copied Translation Label 5 Swap Sentence
    Model Loss Learning rate Batch Size Val Pearson
    RoBERTa Large - Label Smoothing 0, Copied Translation Label 5 MSE 7e-6 8 0.9256
    ELECTRA - Swap Sentence MSE 3e-5 32 0.9287
    ELECTRA - Label Smoothing 0, Copied Translation Label 5, Swap Sentence MSE 3e-5 16 0.9309

snunlp/KR-ELECTRA-discriminator ์ตœ์ ํ™”

  • ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋˜ Pre-trained ๋ชจ๋ธ์ธ snunlp/KR-ELECTRA-discriminator์™€ Data Augmentation ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์  ์กฐํ•ฉ์„ ์‹คํ—˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ์‹คํ—˜ ๊ฒฐ๊ณผ, Swap Sentence๊ฐ€ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ฒ ์ด์Šค๋ผ์ธ์— ์‚ฌ์šฉ๋˜์—ˆ๋˜ L1 Loss ๋ณด๋‹ค MSE Loss๊ฐ€ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ์ดํ›„ ์‹คํ—˜์—์„œ๋Š” MSE Loss๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

RoBERTa Large ์ตœ์ ํ™”

Untitled (37)

  • klue/roberta-large์˜ ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์„œ ํ•™์Šต์ด ์ˆ˜ํ–‰๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ, batch size์™€ learning rate๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์ตœ์ ํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค. Data Augmentation ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๊ฐ€์žฅ ์œ ์˜๋ฏธ ํ–ˆ๋˜ Swap Sentence, Label Smoothing 0 ๋ฐ Copied Translation Label 5, Reverse Text 20% ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ๊ฐ ์‹คํ—˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ์‹คํ—˜ ๊ฒฐ๊ณผ, Label Smoothing 0 ๋ฐ Copied Translation Label 5 ๋ฐ์ดํ„ฐ์˜ MSE Loss, Learning Rate 7e-6, Batch Size 8 ์กฐ๊ฑด์—์„œ Val Pearson 0.9256์œผ๋กœ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

ELECTRA ์ตœ์ ํ™”

Untitled (38)

  • ํ•œ๊ตญ์–ด ELECTRA ๋ชจ๋ธ 3๊ฐœ(monologg/koelectra-base-v3-discriminator, beomi/KcELECTRA-base, snunlp/KR-ELECTRA-discriminator)์™€ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์—์„œ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋ณด์ธ Swap Sentence, Label Smoothing 0, Copied Translation Label 5 ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ, Learning Rate, Batch Size ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ์‹คํ—˜ ๊ฒฐ๊ณผ, snunlp/KR-ELECTRA-discriminator ๋ชจ๋ธ, Label Smoothing 0, Copied Translation Label 5, Swap Sentence ๋ฐ์ดํ„ฐ์˜ Learning Rate 3e-5 Batch Size 16 ์กฐ๊ฑด์—์„œ Val Pearson 0.9309์œผ๋กœ ๋‹จ์ผ ๋ชจ๋ธ ์ค‘ ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

5) Ensemble

  • ํ‰๊ฐ€ ์ง€ํ‘œ์ธ Pearson์˜ ๊ฒฝ์šฐ ์„ ํ˜•์ด๊ธฐ์— Outlier์— ์ทจ์•ฝํ•œ ํŠน์„ฑ์ด ์žˆ์Œ. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ค‘ ํ‰๊ท ์„ ๋„์ž…, Outlier์˜ ์˜ํ–ฅ๋ ฅ์„ ์ค„์˜€์Šต๋‹ˆ๋‹ค.โดโพ

  • ์•™์ƒ๋ธ”์€ ์†Œํ”„ํŠธ๋ณดํŒ… ๋ฐฉ์‹์„ ์ฑ„์šฉํ•˜์—ฌ ๊ฐ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋”ํ•ด์ฃผ๊ณ  ํ‰๊ท ์„ ๋‚ด์ฃผ๋Š” ๋Œ€์‹  ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๊ฐ€์ค‘์น˜๋กœ ๋‘์–ด์„œ ๊ฐ€์ค‘ ํ‰๊ท ์„ ๊ตฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ softmax ์ธต์— ํ†ต๊ณผ์‹œ์ผœ์„œ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„ ๊ฐ ๋ชจ๋ธ์ด ์ถœ๋ ฅํ•œ logit ๊ฐ’๊ณผ ๊ณฑํ•ด์ฃผ์–ด์„œ ์ „๋ถ€ ๋”ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

  • Swap Sentence ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ Positive Skewness ๋ถ„ํฌ์ธ ๋ฐ์ดํ„ฐ์™€ Copied Translation๊ณผ Label Smoothing์„ ์ ์šฉํ•˜์—ฌ Uniform ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์•™์ƒ๋ธ”ํ•˜์—ฌ Test Dataset ๋ถ„ํฌ์— ์˜์กด์ ์ด์ง€ ์•Š์œผ๋ฉฐ Generalํ•œ ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ์ฒ˜์Œ์—๋Š” ์ œ์ผ ์„ฑ๋Šฅ์ด ์ข‹์€ klue/roberta-large์™€ snunlp/KR-ELECTRA-discriminator 1๊ฐœ์”ฉ ๊ฐ€์ ธ์™€์„œ ์„ฑ๋Šฅ์„ 91.24์—์„œ 92.25๋กœ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ํ›„ ๊ฐ ๋ชจ๋ธ์„ 3๊ฐœ์”ฉ ์•™์‚ผ๋ธ”ํ•œ ๋ชจ๋ธ๋กœ ์„ฑ๋Šฅ์„ 92.69๊นŒ์ง€ ์˜ฌ๋ ธ๊ณ , ๋ชจ๋ธ ๊ฐ„์˜ ๋‚ฎ์€ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ–๊ณ  ์žˆ์œผ๋ฉด ์•™์ƒ๋ธ”์ด ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ทผ๊ฑฐ๋ฅผ ํ† ๋Œ€๋กœ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ์•™์ƒ๋ธ”ํ•œ ๊ฒฐ๊ณผ ์ตœ๊ณ  ์„ฑ๋Šฅ์ธ 92.9๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค

    klue/roberta-large ๋ชจ๋ธ ์ตœ๊ณ  ์„ฑ๋Šฅ 1๊ฐœ snunlp/KR-ELECTRA-discriminator ์ตœ๊ณ  ์„ฑ๋Šฅ 1๊ฐœ ์„ฑ๋Šฅ ๊ฐœ์„ : 0.9124 โ†’ 0.9225
    klue/roberta-large ๋ชจ๋ธ 3๊ฐœ snunlp/KR-ELECTRA-discriminator 3๊ฐœ ์„ฑ๋Šฅ ๊ฐœ์„ : 0.9225 โ†’ 0.9269
    klue/roberta-large ๋ชจ๋ธ 3๊ฐœ snunlp/KR-ELECTRA-discriminator 3๊ฐœ beomi/KcELECTRA-base 1๊ฐœ โ€ข monologg/koelectra-base-v3-discriminator 1๊ฐœ ์„ฑ๋Šฅ ๊ฐœ์„ : 0.9269 โ†’ 0.9290

4. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ

Untitled

  • ์ตœ์ข… pearson : 0.9368
  • 14ํŒ€ ์ค‘ public 4์œ„ private 3์œ„

5. ๊ฒฐ๋ก 

๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ์ž˜ ๋ฐ˜์˜ํ•˜๋Š” ๊ธฐ์ดˆ ๋ชจ๋ธ ์„ ์ • ์ดํ›„ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ตœ์ ํ™” ๋ฐ ์•™์ƒ๋ธ” ์ˆ˜ํ–‰

  • ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐœ์„ (oversampling, data augmentation)
  • ๋ฐ์ดํ„ฐ์…‹์— ์ ํ•ฉํ•œ Pretrained Model ์„ ์ • ๋ฐ ์ตœ์ ํ™”
  • ๋‹ค์–‘ํ•œ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์•™์ƒ๋ธ”(Soft Voting)์„ ์ˆ˜ํ–‰

6. Appendix

  • Pre-trained Model ์„ ํƒ

    Model Epoch (Earlystopping/Max epoch, Best Check point) Val loss Val Pearson
    klue/roberta-small 4/100, 4 0.6108 0.8523
    klue/roberta-base 4/100, 4 0.533 0.8916
    jhgan/ko-sroberta-multitask 3/100, 3 0.5149 0.8828
    beomi/KcELECTRA-base 8/20, 4 0.4385 0.9113
    snunlp/KR-ELECTRA-discriminator 9/15, 7 0.4705 0.9242
  • Data Augmentation

    Model Epoch Learning rate Batch Size Data Augmentation Val loss Val Pearson
    klue/roberta-base 4 1e-5 16 Baseline 0.533 0.8916
    9 1e-5 16 ์›๋ณธ:Back Translation 50% (2:1) 0.5864 0.8655
    8 1e-5 16 ์›๋ณธ:Back Translation 33% (3:1) 0.4987 0.8958
    16 1e-5 16 ์›๋ณธ:Back Translation 25% (4:1) 0.4308 0.91
    16 1e-5 16 Copied Translation Label 5 50% 0.4707 0.9126
    5 1e-5 16 Copied Translation Label 5 20% 0.5326 0.9024
    10 1e-5 16 Copied Translation Label 5 10% 0.5083 0.9082
    5 1e-5 16 Reverse Text 50% 0.4957 0.8961
    14 1e-5 16 Reverse Text 20% 0.4464 0.9169
    19 1e-5 16 Reverse Text 10% 0.4869 0.9074
    3 1e-5 16 ์›๋ณธ:Exchange Sentence : Reverse Text 10% (1:1:0.2) 0.4695 0.908
    4 1e-5 16 ์›๋ณธ:Exchange Sentence : Reverse Text 20% (1:1:0.4) 0.4384 0.9118
    klue/roberta-base 4 2e-5 32 Baseline 0.5919 0.8616
    2e-5 32 Swap Sentence 0.5013 0.8967
    2e-5 32 Swap Sentence : Back Translation (2:1) 0.5008 0.8922
    2e-5 32 Swap Sentence : Back Translation (1:1) 0.4978 0.8845
    2e-5 32 Swap Sentence, Label Smoothing 0 50% 0.4892 0.8986
    2e-5 32 Swap Sentence, Label Smoothing 0 25% 0.4722 0.8963
    2e-5 32 Swap Sentence, Label Smoothing 0 50%, Copied Translation Label 5 0.4801 0.8931
    2e-5 32 Swap Sentence, Label Smoothing 0 25%, Copied Translation Label 5 0.4536 0.9123
  • snunlp/KR-ELECTRA-discriminator ์ตœ์ ํ™”

    Model Epoch Loss Data Augmentation Val loss Val Pearson
    snunlp/KR-ELECTRA-discriminator 10 MSE Swap Sentence 0.3914 0.9238
    6 MSE Swap Sentence, Label Smoothing 0 50% 0.4252 0.9096
    12 L1 Swap Sentence 0.5068 0.9001
    17 L1 Swap Sentence, Label Smoothing 0 50% 0.4605 0.9172
    12 L1 Swap Sentence, Label Smoothing 0 50% + Copied Translation Label 5 50% 0.4484 0.9235
    10 L1 Swap Sentence, Label Smoothing 0 50%, Reverse Text 20% 0.4486 0.919
  • RoBERTa Large ์ตœ์ ํ™”

    Model Epoch Learning rate Batch Size Data Augmentation Val loss Val Pearson
    klue/roberta-large 9 1e-6 8 Swap Sentence 0.4043 0.913
    5 3e-6 Swap Sentence 0.4513 0.9116
    11 5e-6 Swap Sentence 0.354 0.9208
    7 7e-6 Swap Sentence 0.379 0.9201
    X 1e-6 Label Smoothing 0, Copied Translation Label 5 X X
    10 3e-6 Label Smoothing 0, Copied Translation Label 5 0.3726 0.9171
    6 5e-6 Label Smoothing 0, Copied Translation Label 5 0.3845 0.9121
    8 7e-6 Label Smoothing 0, Copied Translation Label 5 0.3363 0.9256
    5 5e-6 Copied Translation Label 5 0.4841 0.9019
    8 5e-6 Reverse Text 20% 0.4631 0.91
  1. Data Augmentation using Back-translation for Context-aware Neural Machine Translation

  2. ์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„๋ฒˆ์—ญ์—์„œ ์ตœ์ ํ™”๋œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ธฐ๋ฒ• ๊ณ ์ฐฐ - โ€œ์‹คํ—˜๊ฒฐ๊ณผ Back Translation๊ณผ Copied Translation์„ ํ•จ๊ป˜ ์ ์šฉํ•˜์—ฌ, 4๋Œ€3์˜ ์ƒ๋Œ€์  ๋น„์œจ์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ ๊ฐ€์žฅ ๋†’์€ BLEU ์ ์ˆ˜๋ฅผ ๋ณด์˜€๋‹ค.โ€

  3. Sequence to Sequence Learning with Neural Networks - โ€œโ€ฆ reversing the order of the words in all source sentences (but not target sentences) improved the LSTMโ€™s performance markedlyโ€

  4. Pearson Coefficient of Correlation Explained - โ€œโ€ฆ Pearsonโ€™s correlation coefficient, r, is very sensitive to outliers, which can have a very large effect on the line of best fit and the Pearson correlation coefficient.โ€

About

level1_semantictextsimilarity_nlp-level1-nlp-05 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5