https://arxiv.org/abs/2305.08675
Improved baselines for vision-language pre-training (Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal)
clip pretraining에 대한 튜닝. augmentation 투입과 non contrastive loss 추가가 메인이군요.
#clip