Open
Description
I try to quantize a model with int4_weight_only, and want to get the plained weight, but found the weight has been padded. To reproduce it, run the following script:
import torch
from transformers import TorchAoConfig, AutoModelForCausalLM
model_name = "JackFram/llama-68m"
quantization_config = TorchAoConfig("int4_weight_only")
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda:0", quantization_config=quantization_config)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0].shape)
print(quantized_model.model.layers[0].self_attn.q_proj.weight.tensor_impl.get_plain()[0])
output
(768, 1024)
tensor([[11, 12, 8, ..., 0, 0, 0],
[ 5, 6, 5, ..., 0, 0, 0],
[ 5, 7, 7, ..., 0, 0, 0],
...,
[ 7, 5, 2, ..., 0, 0, 0],
[ 6, 1, 7, ..., 0, 0, 0],
[ 8, 11, 9, ..., 0, 0, 0]], device='cuda:0', dtype=torch.int32)
The original shape should be (768, 768)
, but the plained weight shape is (768, 1024)
. Can we have a remove padding process in get_plain()
function?