Output understanding after export #3972

QuangDucc · 2025-04-08T09:57:15Z

Hi,

After calling "sim.export(path=logdir, filename_prefix='quant_model')", I can get 4 files: quant_model.encodings, quant_model_torch.encodings and quant_model.onnx, quant_model.pth

I read issue How to get a real int8 quanted ONNX model? #2816 and got that the workflow is not supported to export real int8 quant value yet.
But why does it have 2 encodings files?

I figure out that the "param_encodings" in 2 files is same but "activation_encodings" looks so different.

here is for "quant_model.encodings" file:

and here is for "quant_model_torch.encodings" file:

Can I get the real int8 value by calculating via (max, min, offset, scale)?

Thank you.

quic-klhsieh · 2025-04-29T03:32:48Z

Hi @QuangDucc , answers to your questions below:

The two encodings files exported during quantsim export serve different purposes.

File ending in '_torch.encodings':

The file ending with '_torch.encodings' maps layers in the Pytorch quantsim model to activation and parameter encodings. For activation encodings, the top level dictionary contains layer names as keys. There are further nested dictionaries corresponding to inputs or outputs of the layers, and finally indices of inputs or outputs, ultimately mapping to the encodings corresponding to a particular layer's input or output tensor. Parameter encodings will map torch praameter names to corresponding encodings.

This file is mainly used for saving encodings to later be loaded into another quantsim object quantizing the same model. This is useful in case you have calibrated the model in the past and want to load encodings into a new quantsim without going through calibration once again for example.

File ending in '.encodings' without '_torch':

This file is used in conjunction with the exported .onnx file to be taken onto target by going through our QAIRT stack. The names in this encodings file will correspond to tensor names in the exported .onnx graph.

If you compare the two encodings files, you will find the same encodings show up in both files but mapped under different names depending on whether we are mapping it to Pytorch layer inputs/outputs or the equivalent onnx tensors.

Given encodings for a floating-point tensor, you can compute the quantized int8 tensor by performing

quantized_tensor = round(clamp(fp_tensor, min, max) / scale) - offset

quic-kyunggeu assigned quic-klhsieh Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output understanding after export #3972

Output understanding after export #3972

QuangDucc commented Apr 8, 2025

quic-klhsieh commented Apr 29, 2025

Output understanding after export #3972

Output understanding after export #3972

Comments

QuangDucc commented Apr 8, 2025

quic-klhsieh commented Apr 29, 2025