https://arxiv.org/abs/2102.12122
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions (Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao)
디텍션-퍼슨들의 트랜스포머 깎기. 트랜스포머가 점점 더 가용한 혹은 채택할만한 범위로 들어오는군요.
#vision_transformer