Skip to content

reproduce your results #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zzksdu opened this issue Feb 17, 2025 · 3 comments
Open

reproduce your results #106

zzksdu opened this issue Feb 17, 2025 · 3 comments

Comments

@zzksdu
Copy link

zzksdu commented Feb 17, 2025

As introduced in your paper, the VITA training process consists of 3 steps.
The first step is to finetune llm module.
The second step is multimodel alignment
The third part is multimodel instruction tuning.

In your code, there are many training scripts. Can you indicate which script corresponds to which step of training?

Another question is, in your source code, the language model uses qwen2, so is the final language model qwen2 or mixtral 8 * 7B?
@wangxiongts @BradyFU @linhaojia13 @longzw1997

@zzksdu
Copy link
Author

zzksdu commented Feb 18, 2025

Do you have any plans to make your training dataset public?

@linhaojia13
Copy link
Collaborator

VITA-1.0 uses Mixtral as its base language model, while VITA-1.5 uses Qwen2.5-7B-Instruct. Currently, VITA-1.0 is deprecated, so let me explain the training stages for VITA-1.5:

  • pretrain_mlp_qwen_nodes.sh: Stage 1.1
  • finetune_qwen_nodes.sh: Stage 1.2
  • finetuneTask_qwen_nodes.sh: Stage 1.3
  • finetuneTaskNeg_qwen_nodes.sh: Stage 2.2

As for the datasets used, they are not publicly available, but the majority of them consist of open-source data.

@zzksdu
Copy link
Author

zzksdu commented Mar 12, 2025

@linhaojia13

  1. 在论文中, s1.3是 unfeeze vision tower + mlp + llm, 为什么在finetuneTask_qwen_nodes.sh与finetune_qwen_nodes.sh不太相同,finetune_qwen_nodes.sh中添加了unfreeze_vision_tower,而s1.3的脚本中没有这个参数。

Image
2. 另外一个问题是:

在load s1.2的模型的过程中出现下面的问题,这正常吗?
Image

加载你们repo的官方模型也会遇到这个问题。

Image
3. pretrain_audio_mlp_qwen_nodes.sh可以认为是s2.1的训练脚本吗?
4. s2.2的训练脚本中,设置 --freeze_audio_encoder True --freeze_audio_encoder_adapter False , 这个和论文中的vision tower and audio encoder 均是激活的,有点不太相同。所以这里是按照论文中的来实际配置吗?
cc. @BradyFU @wangxiongts @lxysl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants