Skip to content

int4_weight_only get plain weight are padded #2249

Open
@jiqing-feng

Description

@jiqing-feng

I try to quantize a model with int4_weight_only, and want to get the plained weight, but found the weight has been padded. To reproduce it, run the following script:

import torch
from transformers import TorchAoConfig, AutoModelForCausalLM
 
model_name = "JackFram/llama-68m"
quantization_config = TorchAoConfig("int4_weight_only")
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda:0", quantization_config=quantization_config)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0].shape)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0])

output

(768, 1024)
tensor([[11, 12,  8,  ...,  0,  0,  0],
        [ 5,  6,  5,  ...,  0,  0,  0],
        [ 5,  7,  7,  ...,  0,  0,  0],
        ...,
        [ 7,  5,  2,  ...,  0,  0,  0],
        [ 6,  1,  7,  ...,  0,  0,  0],
        [ 8, 11,  9,  ...,  0,  0,  0]], device='cuda:0', dtype=torch.int32)

The original shape should be (768, 768), but the plained weight shape is (768, 1024). Can we have a remove padding process in get_plain() function?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions