https://arxiv.org/abs/2102.12895
Evolving Attention with Residual Convolutions (Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong)
여러 attention logit들을 결합해서 refine하려는 아이디어. realformer스럽죠?
#attention #transformer