Skip to content

issues about model initialization #29

@2133978609

Description

@2133978609

thanks for your great work, I noticed that in all four stages the model needs to be reloaded using

model = CoVTForConditionalGeneration.from_pretrained(
            model_args.model_path,
            torch_dtype=compute_dtype,
            attn_implementation="flash_attention_2" if not training_args.disable_flash_attn2 else "sdpa", 
            **bnb_model_from_pretrained_args
 )

The following code is used in the initialization code of CoVTForConditionalGeneration to initialize the variables used by sam cross attention.

self.sam_projection = nn.Linear(3584, 256)
self.sam_query_vectors = nn.Parameter(torch.randn(8, 256, dtype=torch.bfloat16, requires_grad=True))
self.sam_cross_attention = nn.MultiheadAttention(embed_dim=256, num_heads=8, batch_first=True)

This means that these variables are randomly re-initialized in four stages. However, sam_query_vectors and sam_projection are trainable. Why not save these variables in the previous stage and load them into the model in the next stage, so that the sam_query_vectors and sam_projection used in the four stages are consistent and uninterrupted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions