enable smoothquant for int8 static tensor #3468

jcaip · 2025-12-08T21:21:17Z

This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API.

You can use the new flow like follows:

from torchao.quantization.quant_api import (
    Int8StaticActivationInt8WeightConfig,
)
from torchao.prototype.smoothquant import (
    SmoothQuantConfig
)

config = SmoothQuantConfig(
            base_config=Int8StaticActivationInt8Weight(granularity=PerRow()),
            step=SmoothQuantStep.PREPARE,
            alpha=0.5,
        )

quantize_(model, config)

# Perform calibration with test data
model(*x)

config.step = SmoothQuantStep.CONVERT
quantize_(model, config)

# model will now be statically quantized with the inputs used in smoothquant observer. 
model(*x)

Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-12-08T21:21:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3468

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 2586ab6 with merge base f99105a ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/test_low_bit_optim.py::TestFSDP2::test_fsdp2
Run TorchAO Experimental MPS Tests / test-mps-ops / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jcaip · 2025-12-08T22:30:10Z

cc @Xia-Weiwen and @cyxlily fyi

jerryzh168 · 2025-12-09T00:47:15Z

torchao/prototype/smoothquant/api.py

    qw = quant_mod.weight

    # Add smoothing factor metadata
    qw = to_weight_tensor_with_linear_activation_scale_metadata(


we should not be using this, please check awq on how this should be implemented in the new stack:

ao/torchao/prototype/awq/api.py

Lines 108 to 113 in 08e5e20

assert isinstance(qw, SupportsActivationPreScaling), (

"weight must support activation scaling through implementing `SupportsActivationPreScaling`"

)

# since we want to do `act` * `act_pre_scale` during runtime for speed, we'll save the

# reciprocal of the `equalization_scale`

qw.act_pre_scale = 1.0 / equalization_scale

jerryzh168 · 2025-12-09T02:26:14Z

torchao/quantization/quant_api.py

    """

-    scale: torch.Tensor
+    scale: torch.Tensor = None


nit: Optional[torch.Tensor]

also maybe static_scale might be more descriptive I feel

test/prototype/test_smoothquant.py

meta-codesync · 2025-12-15T18:42:19Z

@jcaip has imported this pull request. If you are a Meta employee, you can view this in D88784212.

cyxlily · 2025-12-17T03:23:37Z

@jcaip Our customer needs activation quantization PerTensor and weight quantization PerRow. Will you implement it, or may I create a new PR to do it?

jcaip · 2025-12-17T05:45:47Z

@cyxlily feel free to open a new PR for activation per tensor x weight per row, it's not something im planning to do currently.

Thank you for your smoothquant pr btw, I used it to implement this.

jerryzh168 · 2025-12-18T00:00:42Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+            sqnr_static_compile
+            == sqnr_static_eager
+            == sqnr_dynamic_compile
+            == sqnr_dynamic_eager


are we trying to say static_out_compile == static_out_eager == dynamic_out_compile == dynamic_out_eager here? if so I think it might be clearer just to assert all these are equal to each other

jerryzh168 · 2025-12-18T00:02:15Z

torchao/prototype/smoothquant/api.py

    else:
        raise ValueError(f"Unexpected step: {step}")

+    if isinstance(base_config, Int8StaticActivationInt8WeightConfig):


I think we shouldn't have specific config here, maybe change this to a similar protocol like SupportsActivationPreScaling for config?

I think figuring out how to do this generally will need a bit more design, we'd need to figure out how to map to the appropriate QuantizeTensorToInt/FloatXKwargs object. Agree we should be able to do this though, but can I address in a later PR?

jerryzh168 · 2025-12-19T01:39:52Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

            block_size,
            self.dtype,
            act_quant_kwargs=self.act_quant_kwargs,
+            act_scale=self.act_scale,


I guess slice doesn't work for static quant int8 before, can you add a test for that?

jerryzh168 · 2025-12-19T01:40:45Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

        old_int8_tensor.scale[index],
        old_int8_tensor.block_size[1:],
        old_int8_tensor.dtype,
+        old_int8_tensor.act_scale,


same for this one, seems like select op breaks before with static quant

jcaip added 30 commits December 1, 2025 12:55

ruff fixes

0b73aed

add init

1e49945

fix ruff again

669b6ee

update

9071526

wip

1539e0f

Merge branch 'main' into jcaip/int8-tensor

d9a2b1b

undo update tests

673f228

fix ruff

739fd64

fix varname

750db1a

fix typing

9410488

add tests

45a3a76

fix dtype

4e2f09c

fix ci

dd80cca

address granularity cr

7f73062

update _choose_quant_func_and_quantize_tensor

ac6a2b6

make block size required attribute

f28df4a

made dtype required as well

328585e

address nits

ce4d568

skip per tensor weight only test for now

a665d45

add static quant

0338016

add static quant

ee39691

update

9eb0aa9

static quant working eager + compile

d4a1514

remove file

3cdea56

added asserts

fa9022d

undo smoothquant change

8ce5cde

fix return

6f64121

Merge branch 'main' into jcaip/static-quant-rebased

8ae921d

got smoothquant + int8 static working

5b9e243

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 8, 2025

jcaip added 3 commits December 8, 2025 13:26

free tests

3d18edf

fix static scale check

9e07f8b

update

4274e02

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Dec 8, 2025

jcaip changed the title ~~[wip] enable smoothquant for int8 static tensor~~ enable smoothquant for int8 static tensor Dec 8, 2025

jcaip marked this pull request as ready for review December 8, 2025 22:24

jcaip mentioned this pull request Dec 8, 2025

Add int8 static quantization workflow #3442

Merged

jcaip requested a review from jerryzh168 December 8, 2025 22:29

jerryzh168 reviewed Dec 9, 2025

View reviewed changes

jcaip and others added 2 commits December 8, 2025 18:21

address cr feedback

b5309eb

Merge branch 'jcaip/static-quant-rebased' into jcaip/enable-smoothquant

a732fee

jerryzh168 reviewed Dec 9, 2025

View reviewed changes

test/prototype/test_smoothquant.py Show resolved Hide resolved

jcaip changed the base branch from jcaip/static-quant-rebased to main December 9, 2025 04:34

Merge branch 'main' into jcaip/enable-smoothquant

0c23589

jcaip added 2 commits December 17, 2025 15:54

update

0872986

fix ruff

049830f

jcaip force-pushed the jcaip/enable-smoothquant branch from 0c23589 to f389a94 Compare December 17, 2025 23:56

jerryzh168 reviewed Dec 18, 2025

View reviewed changes

fix varname

2586ab6

jerryzh168 reviewed Dec 18, 2025

View reviewed changes

jcaip force-pushed the jcaip/enable-smoothquant branch from f389a94 to 2586ab6 Compare December 18, 2025 00:02

jerryzh168 reviewed Dec 19, 2025

View reviewed changes

	assert isinstance(qw, SupportsActivationPreScaling), (
	"weight must support activation scaling through implementing `SupportsActivationPreScaling`"
	)
	# since we want to do `act` * `act_pre_scale` during runtime for speed, we'll save the
	# reciprocal of the `equalization_scale`
	qw.act_pre_scale = 1.0 / equalization_scale

enable smoothquant for int8 static tensor #3468

Are you sure you want to change the base?

enable smoothquant for int8 static tensor #3468

Conversation

jcaip commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3468

❌ 2 New Failures

Uh oh!

jcaip commented Dec 8, 2025

Uh oh!

jerryzh168 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

meta-codesync bot commented Dec 15, 2025

Uh oh!

cyxlily commented Dec 17, 2025

Uh oh!

jcaip commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcaip Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jcaip commented Dec 8, 2025 •

edited

Loading

pytorch-bot bot commented Dec 8, 2025 •

edited

Loading

jcaip commented Dec 17, 2025 •

edited

Loading

jerryzh168 Dec 18, 2025 •

edited

Loading