Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDrop Method #11

Closed
jun0wanan opened this issue Jan 22, 2025 · 5 comments
Closed

PDrop Method #11

jun0wanan opened this issue Jan 22, 2025 · 5 comments

Comments

@jun0wanan
Copy link

I find your code is :


<html>
<body>
<!--StartFragment-->
"llm_compress_layer_list": [
--
  | 24
  | ],
  | "llm_compress_type": "attention",
  | "llm_image_token_ratio_list": [
  | 1.0,
  | 0.5
  | ],

<!--EndFragment-->
</body>
</html>

however, in your paper is :

At the shallow
layers of the LLM, we uniformly drop a small number of
video tokens (i.e. uniform drop)

Can you tell me about that difference? Thanks!

@hello-bluedog
Copy link

https://huggingface.co/OpenGVLab/VideoChat-Flash-Qwen2-7B_res224/blob/0018d8199ed96cae61adde768c73bea2e2cf4fbd/config.json#L167

Another additional question, were pDrops not used during training?

@leexinhao
Copy link
Collaborator

Sorry for this confusion, we choose to set the drop_type while inferencing:

Image

@hello-bluedog It was not used in training our final version of the model, in order to be compatible with policies such as data packing and sequence parallel (in fact, there is no conflict, it is just an engineering problem), and whether we enabled drop during training had little impact on our ablation study.

@jun0wanan
Copy link
Author

Sorry for this confusion, we choose to set the drop_type while inferencing:

Image

@hello-bluedog It was not used in training our final version of the model, in order to be compatible with policies such as data packing and sequence parallel (in fact, there is no conflict, it is just an engineering problem), and whether we enabled drop during training had little impact on our ablation study.

可是这个drop_type是只有一个呀,具体到里面代码就是选择一种在第几层,我看了你们的config是24层中进行attention操作

@hello-bluedog
Copy link

作者的意思应该是,在inference时这个配置会被修改,实际上用到的配置参数是这个?

Image

@leexinhao
Copy link
Collaborator

作者的意思应该是,在inference时这个配置会被修改,实际上用到的配置参数是这个?

Image

是的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants