int4_weight_only get plain weight are padded

I try to quantize a model with int4_weight_only, and want to get the plained weight, but found the weight has been padded. To reproduce it, run the following script:
```python
import torch
from transformers import TorchAoConfig, AutoModelForCausalLM
 
model_name = "JackFram/llama-68m"
quantization_config = TorchAoConfig("int4_weight_only")
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda:0", quantization_config=quantization_config)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0].shape)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0])
```
output
```
(768, 1024)
tensor([[11, 12,  8,  ...,  0,  0,  0],
        [ 5,  6,  5,  ...,  0,  0,  0],
        [ 5,  7,  7,  ...,  0,  0,  0],
        ...,
        [ 7,  5,  2,  ...,  0,  0,  0],
        [ 6,  1,  7,  ...,  0,  0,  0],
        [ 8, 11,  9,  ...,  0,  0,  0]], device='cuda:0', dtype=torch.int32)
```
The original shape should be `(768, 768)`, but the plained weight shape is `(768, 1024)`. Can we have a remove padding process in `get_plain()` function?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

int4_weight_only get plain weight are padded #2249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

int4_weight_only get plain weight are padded #2249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions