issues about model initialization

thanks for your great work, I noticed that in all four stages the model needs to be reloaded using 

```
model = CoVTForConditionalGeneration.from_pretrained(
            model_args.model_path,
            torch_dtype=compute_dtype,
            attn_implementation="flash_attention_2" if not training_args.disable_flash_attn2 else "sdpa", 
            **bnb_model_from_pretrained_args
 )
```

The following code is used in the initialization code of CoVTForConditionalGeneration to initialize the variables used by sam cross attention.
```
self.sam_projection = nn.Linear(3584, 256)
self.sam_query_vectors = nn.Parameter(torch.randn(8, 256, dtype=torch.bfloat16, requires_grad=True))
self.sam_cross_attention = nn.MultiheadAttention(embed_dim=256, num_heads=8, batch_first=True)
```
This means that these variables are randomly re-initialized in four stages. However, sam_query_vectors and sam_projection are trainable. Why not save these variables in the previous stage and load them into the model in the next stage, so that the sam_query_vectors and sam_projection used in the four stages are consistent and uninterrupted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issues about model initialization #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

issues about model initialization #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions