You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AWQ] skip smoothing if NaNs found in forward pass (#1771)
SUMMARY:
Users raised an issue when running AWQ on
`CohereLabs/command-a-vision-07-2025`. It turns out valid input was
generating NaNs in forward passes of the original model, which caused
AWQ to error out. This safeguard checks if NaNs appear in the forward
pass output of a parent module, and raises a warning and skips smoothing
if NaNs are found. With this the model runs through AWQ successfully,
with the following output displayed in logs:
```
(64/65): Calibrating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.08s/it]
Smoothing: 0%| | 0/2 [00:00<?, ?it/s]2025-08-21T17:53:51.934034+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.input_layernorm, NaN outputs found during forward pass of the parent module model.language_model.layers.63. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues
2025-08-21T17:53:51.948120+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.mlp.up_proj, NaN outputs found during forward pass of the parent module model.language_model.layers.63.mlp.down_proj. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues
Smoothing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 50.77it/s]
```
TEST PLAN:
Python script that fails on `main`, but works on this branch:
<details><summary> user script awq_cohere.py </summary>
```python
import torch
from llmcompressor.modifiers.awq import AWQMapping
from llmcompressor import oneshot
from transformers import AutoProcessor, AutoModelForImageTextToText
from llmcompressor.modifiers.awq import AWQModifier
def data_collator(batch):
assert len(batch) == 1
return {key: torch.tensor(value) for key, value in batch[0].items()}
COHERE_AWQ_MAPPINGS = [
AWQMapping(
"re:.*input_layernorm$",
[
"re:.*self_attn.q_proj$",
"re:.*self_attn.k_proj$",
"re:.*self_attn.v_proj$",
"re:.*mlp.gate_proj$",
"re:.*mlp.up_proj$",
],
),
AWQMapping("re:.*v_proj$", ["re:.*o_proj$"]),
AWQMapping(
"re:.*up_proj$",
["re:.*down_proj$"],
),
]
MODEL_NAME="CohereLabs/command-a-vision-07-2025"
NSAMPLES = 256
dataset = "flickr30k"
DATASET_SPLIT = {"calibration": f"test[:{NSAMPLES}]"}
MAX_SEQ_LEN = 2048
skipped_layers = ['re:.*lm_head', 're:model.multi_modal_projector.*', 're:model.vision_tower.*']
output_dir = "./out_path/"
processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=False)
model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, torch_dtype="auto")
recipe = [
AWQModifier(ignore=skipped_layers, scheme="W4A16_ASYM", targets=["Linear"], mappings=COHERE_AWQ_MAPPINGS),
]
oneshot(
model=model,
dataset=dataset,
processor=processor,
splits=DATASET_SPLIT,
recipe=recipe,
max_seq_length=MAX_SEQ_LEN,
num_calibration_samples=NSAMPLES,
data_collator=data_collator,
pad_to_max_length=False,
sequential_targets=["Cohere2DecoderLayer"]
)
model.save_pretrained(output_dir, save_compressed=True, skip_compression_stats=True)
processor.save_pretrained(output_dir)
```
</details>
Signed-off-by: Brian Dellabetta <[email protected]>
0 commit comments