Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 459 Bytes

211016 Sharpness-Aware Minimization Improves Language Model Generalization.md

File metadata and controls

7 lines (4 loc) · 459 Bytes

https://arxiv.org/abs/2110.08529

Sharpness-Aware Minimization Improves Language Model Generalization (Dara Bahri, Hossein Mobahi, Yi Tay)

sam을 lm에 써봤군요. [[210603 When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations]]에서 유의미한 향상이 있었으니 여기서도 의미있을 수 있다는 것은 자연스럽긴 하네요. swad 같은 경우도 고려해볼만하겠네요.

#lm #regularization