Skip to content

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Sep 3, 2025

Purpose

  • Support loading models with online transforms applied via Compressed Tensors (LLM Compressor)

Prerequisites

Changes

  • Require a minimum compressed tensors version of 0.11.0 (to support transform features)
  • Load transform configs (if available), and apply them to the model before weight loading
  • (misc) Refactor compressed tensors tests to check for perplexity, rather than exact output matches
  • (misc) Remove update_dtype in order to reduce complexity and give users more control/predictability of model data types

Testing

  • Regression tested using CompressedTensorsTest, added an online quip-style transformed model for testing
    • Perplexity results match expectations

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@Rocketknight1
Copy link
Member

cc @MekkCyber

@kylesayrs kylesayrs marked this pull request as draft September 4, 2025 16:06
@kylesayrs
Copy link
Contributor Author

Putting in draft for now, need to do some more testing

Signed-off-by: Kyle Sayers <[email protected]>
Copy link
Contributor

github-actions bot commented Sep 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: compressed_tensors_integration

"""Models quantized using compressed tensors can be saved to disk"""
return True

def dequantize(self, model: "PreTrainedModel"):
Copy link

@brian-dellabetta brian-dellabetta Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to call this dequantize to match huggingface API? If not, decompress would be more accurate since it might involve something beyond quantization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants