Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 416 Bytes

210731 CrossFormer.md

File metadata and controls

7 lines (4 loc) · 416 Bytes

https://arxiv.org/abs/2108.00154

CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention (Wenxiao Wang, Lu Yao, Long Chen, Deng Cai, Xiaofei He, Wei Liu)

multiscale pooling + dilated local self attention + dynamic positional encoding 조합이네요. 디텍션에서 swin을 깨는 결과를 보여주는데...1x 스케쥴이군요. 직접 테스트해보는 것이 필요하겠습니다.

#vit