Why not use nn.MultiheadAttention in vit? #283
              
                
                  
                  
                    Answered
                  
                  by
                    rwightman
                  
              
          
                  
                    
                      ZhiyuanChen
                    
                  
                
                  asked this question in
                Q&A
              
            -
| It seems PyTorch have provided  | 
Beta Was this translation helpful? Give feedback.
      
      
          Answered by
          
            rwightman
          
      
      
        Nov 21, 2020 
      
    
    Replies: 1 comment 1 reply
-
| @ZhiyuanChen I wasn't quite sure how the official version would look wrt to the attention module and how close it'd be to the PyTorch impl when I started. Plus it was pretty straightforward to just implement it as it is. I don't think the current PyTorch impl s much faster. The Apex one would likely be, but it's harder to work with. | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
      Answer selected by
        ZhiyuanChen
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
@ZhiyuanChen I wasn't quite sure how the official version would look wrt to the attention module and how close it'd be to the PyTorch impl when I started. Plus it was pretty straightforward to just implement it as it is. I don't think the current PyTorch impl s much faster. The Apex one would likely be, but it's harder to work with.