[AWQ] skip smoothing if NaNs found in forward pass (#1771)

brian-dellabetta · web-flow · commit e3301825bbc4 · 2025-08-21T21:32:42.000Z
SUMMARY: Users raised an issue when running AWQ on `CohereLabs/command-a-vision-07-2025`. It turns out valid input was generating NaNs in forward passes of the original model, which caused AWQ to error out. This safeguard checks if NaNs appear in the forward pass output of a parent module, and raises a warning and skips smoothing if NaNs are found. With this the model runs through AWQ successfully, with the following output displayed in logs: ``` (64/65): Calibrating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.08s/it] Smoothing: 0%| | 0/2 [00:00<?, ?it/s]2025-08-21T17:53:51.934034+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.input_layernorm, NaN outputs found during forward pass of the parent module model.language_model.layers.63. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues 2025-08-21T17:53:51.948120+0000 | _apply_smoothing | WARNING - Skipping smooth_layer model.language_model.layers.63.mlp.up_proj, NaN outputs found during forward pass of the parent module model.language_model.layers.63.mlp.down_proj. The model is either generating NaN output with provided calibration data set, or the mappings are incorrectly set and modifying the model in undesired ways. If you encounter this consistently, raise an issue at https://github.com/vllm-project/llm-compressor/issues Smoothing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 50.77it/s] ``` TEST PLAN: Python script that fails on `main`, but works on this branch: <details><summary> user script awq_cohere.py </summary> ```python import torch from llmcompressor.modifiers.awq import AWQMapping from llmcompressor import oneshot from transformers import AutoProcessor, AutoModelForImageTextToText from llmcompressor.modifiers.awq import AWQModifier def data_collator(batch): assert len(batch) == 1 return {key: torch.tensor(value) for key, value in batch[0].items()} COHERE_AWQ_MAPPINGS = [ AWQMapping( "re:.*input_layernorm$", [ "re:.*self_attn.q_proj$", "re:.*self_attn.k_proj$", "re:.*self_attn.v_proj$", "re:.*mlp.gate_proj$", "re:.*mlp.up_proj$", ], ), AWQMapping("re:.*v_proj$", ["re:.*o_proj$"]), AWQMapping( "re:.*up_proj$", ["re:.*down_proj$"], ), ] MODEL_NAME="CohereLabs/command-a-vision-07-2025" NSAMPLES = 256 dataset = "flickr30k" DATASET_SPLIT = {"calibration": f"test[:{NSAMPLES}]"} MAX_SEQ_LEN = 2048 skipped_layers = ['re:.*lm_head', 're:model.multi_modal_projector.*', 're:model.vision_tower.*'] output_dir = "./out_path/" processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=False) model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, torch_dtype="auto") recipe = [ AWQModifier(ignore=skipped_layers, scheme="W4A16_ASYM", targets=["Linear"], mappings=COHERE_AWQ_MAPPINGS), ] oneshot( model=model, dataset=dataset, processor=processor, splits=DATASET_SPLIT, recipe=recipe, max_seq_length=MAX_SEQ_LEN, num_calibration_samples=NSAMPLES, data_collator=data_collator, pad_to_max_length=False, sequential_targets=["Cohere2DecoderLayer"] ) model.save_pretrained(output_dir, save_compressed=True, skip_compression_stats=True) processor.save_pretrained(output_dir) ``` </details> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
diff --git a/src/llmcompressor/modifiers/awq/base.py b/src/llmcompressor/modifiers/awq/base.py
@@ -479,6 +479,20 @@ def _apply_smoothing(self, model: Module) -> None:
                     )
                     del self._smooth_activation_means[mapping.smooth_name]
                     continue
+                if not all(
+                    [fp16_output.isfinite().all() for fp16_output in fp16_outputs]
+                ):
+                    logger.warning(
+                        f"Skipping smooth_layer {mapping.smooth_name}, NaN or inf "
+                        "outputs found during forward pass of the parent module "
+                        f"{mapping.parent_name}. The model is either generating NaN "
+                        "output with provided calibration data set, or the mappings "
+                        "are incorrectly set and modifying the model in undesired "
+                        "ways. If you encounter this consistently, raise an issue at "
+                        "https://github.com/vllm-project/llm-compressor/issues"
+                    )
+                    del self._smooth_activation_means[mapping.smooth_name]
+                    continue
 
                 x_mean = self._smooth_activation_means[mapping.smooth_name][0]