Skip to content

Commit e330182

Browse files
[AWQ] skip smoothing if NaNs found in forward pass (#1771)
SUMMARY: Users raised an issue when running AWQ on `CohereLabs/command-a-vision-07-2025`. It turns out valid input was generating NaNs in forward passes of the original model, which caused AWQ to error out. This safeguard checks if NaNs appear in the forward pass output of a parent module, and raises a warning and skips smoothing if NaNs are found. With this the model runs through AWQ successfully, with the following output displayed in logs: ``` (64/65): Calibrating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.08s/it] Smoothing: 0%| | 0/2 [00:00<?, ?it/s]2025-08-21T17:53:51.934034+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.input_layernorm, NaN outputs found during forward pass of the parent module model.language_model.layers.63. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues 2025-08-21T17:53:51.948120+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.mlp.up_proj, NaN outputs found during forward pass of the parent module model.language_model.layers.63.mlp.down_proj. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues Smoothing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 50.77it/s] ``` TEST PLAN: Python script that fails on `main`, but works on this branch: <details><summary> user script awq_cohere.py </summary> ```python import torch from llmcompressor.modifiers.awq import AWQMapping from llmcompressor import oneshot from transformers import AutoProcessor, AutoModelForImageTextToText from llmcompressor.modifiers.awq import AWQModifier def data_collator(batch): assert len(batch) == 1 return {key: torch.tensor(value) for key, value in batch[0].items()} COHERE_AWQ_MAPPINGS = [ AWQMapping( "re:.*input_layernorm$", [ "re:.*self_attn.q_proj$", "re:.*self_attn.k_proj$", "re:.*self_attn.v_proj$", "re:.*mlp.gate_proj$", "re:.*mlp.up_proj$", ], ), AWQMapping("re:.*v_proj$", ["re:.*o_proj$"]), AWQMapping( "re:.*up_proj$", ["re:.*down_proj$"], ), ] MODEL_NAME="CohereLabs/command-a-vision-07-2025" NSAMPLES = 256 dataset = "flickr30k" DATASET_SPLIT = {"calibration": f"test[:{NSAMPLES}]"} MAX_SEQ_LEN = 2048 skipped_layers = ['re:.*lm_head', 're:model.multi_modal_projector.*', 're:model.vision_tower.*'] output_dir = "./out_path/" processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=False) model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, torch_dtype="auto") recipe = [ AWQModifier(ignore=skipped_layers, scheme="W4A16_ASYM", targets=["Linear"], mappings=COHERE_AWQ_MAPPINGS), ] oneshot( model=model, dataset=dataset, processor=processor, splits=DATASET_SPLIT, recipe=recipe, max_seq_length=MAX_SEQ_LEN, num_calibration_samples=NSAMPLES, data_collator=data_collator, pad_to_max_length=False, sequential_targets=["Cohere2DecoderLayer"] ) model.save_pretrained(output_dir, save_compressed=True, skip_compression_stats=True) processor.save_pretrained(output_dir) ``` </details> Signed-off-by: Brian Dellabetta <[email protected]>
1 parent 6304ecf commit e330182

File tree

1 file changed

+14
-0
lines changed
  • src/llmcompressor/modifiers/awq

1 file changed

+14
-0
lines changed

src/llmcompressor/modifiers/awq/base.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,20 @@ def _apply_smoothing(self, model: Module) -> None:
479479
)
480480
del self._smooth_activation_means[mapping.smooth_name]
481481
continue
482+
if not all(
483+
[fp16_output.isfinite().all() for fp16_output in fp16_outputs]
484+
):
485+
logger.warning(
486+
f"Skipping smooth_layer {mapping.smooth_name}, NaN or inf "
487+
"outputs found during forward pass of the parent module "
488+
f"{mapping.parent_name}. The model is either generating NaN "
489+
"output with provided calibration data set, or the mappings "
490+
"are incorrectly set and modifying the model in undesired "
491+
"ways. If you encounter this consistently, raise an issue at "
492+
"https://github.com/vllm-project/llm-compressor/issues"
493+
)
494+
del self._smooth_activation_means[mapping.smooth_name]
495+
continue
482496

483497
x_mean = self._smooth_activation_means[mapping.smooth_name][0]
484498

0 commit comments

Comments
 (0)