-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDrop Method #11
Comments
Another additional question, were pDrops not used during training? |
Sorry for this confusion, we choose to set the drop_type while inferencing: @hello-bluedog It was not used in training our final version of the model, in order to be compatible with policies such as data packing and sequence parallel (in fact, there is no conflict, it is just an engineering problem), and whether we enabled drop during training had little impact on our ablation study. |
可是这个drop_type是只有一个呀,具体到里面代码就是选择一种在第几层,我看了你们的config是24层中进行attention操作 |
I find your code is :
however, in your paper is :
At the shallow
layers of the LLM, we uniformly drop a small number of
video tokens (i.e. uniform drop)
Can you tell me about that difference? Thanks!
The text was updated successfully, but these errors were encountered: