[Quantization] Allow loading of transform configs #40673

kylesayrs · 2025-09-03T23:28:31Z

Purpose

Support loading models with online transforms applied via Compressed Tensors (LLM Compressor)

Prerequisites

[Transform] Support loading random hadamards on meta device neuralmagic/compressed-tensors#445

Changes

Require a minimum compressed tensors version of 0.11.0 (to support transform features)
Load transform configs (if available), and apply them to the model before weight loading
(misc) Refactor compressed tensors tests to check for perplexity, rather than exact output matches
(misc) Remove update_dtype in order to reduce complexity and give users more control/predictability of model data types

Testing

Regression tested using CompressedTensorsTest, added an online quip-style transformed model for testing
- Perplexity results match expectations

Signed-off-by: Kyle Sayers <[email protected]>

Rocketknight1 · 2025-09-04T12:12:47Z

cc @MekkCyber

kylesayrs · 2025-09-04T16:07:06Z

Putting in draft for now, need to do some more testing

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-09-08T00:45:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: compressed_tensors_integration

brian-dellabetta · 2025-09-09T17:12:59Z

src/transformers/quantizers/quantizer_compressed_tensors.py

        """Models quantized using compressed tensors can be saved to disk"""
        return True
+
+    def dequantize(self, model: "PreTrainedModel"):


do we have to call this dequantize to match huggingface API? If not, decompress would be more accurate since it might involve something beyond quantization?

kylesayrs added 4 commits September 3, 2025 19:23

apply transform config

8fd2c05

Signed-off-by: Kyle Sayers <[email protected]>

remove unused code

da8a76a

Signed-off-by: Kyle Sayers <[email protected]>

reorder application

49b14f0

Signed-off-by: Kyle Sayers <[email protected]>

fix style

655149a

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Sep 4, 2025

[Transform] Support loading random hadamards on meta device neuralmagic/compressed-tensors#445

Merged

Merge branch 'main' into kylesayrs/ct-apply-transform

a943f42

kylesayrs marked this pull request as draft September 4, 2025 16:06

patch_tie_weights_fn

ac720f4

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta reviewed Sep 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization] Allow loading of transform configs #40673

[Quantization] Allow loading of transform configs #40673

kylesayrs commented Sep 3, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Sep 4, 2025

Uh oh!

kylesayrs commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

brian-dellabetta Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Quantization] Allow loading of transform configs #40673

Are you sure you want to change the base?

[Quantization] Allow loading of transform configs #40673

Conversation

kylesayrs commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Testing

Uh oh!

Rocketknight1 commented Sep 4, 2025

Uh oh!

kylesayrs commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

brian-dellabetta Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs commented Sep 3, 2025 •

edited

Loading

brian-dellabetta Sep 9, 2025 •

edited

Loading