Skip to content

Output understanding after export #3972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
QuangDucc opened this issue Apr 8, 2025 · 1 comment
Open

Output understanding after export #3972

QuangDucc opened this issue Apr 8, 2025 · 1 comment
Assignees

Comments

@QuangDucc
Copy link

Hi,

After calling "sim.export(path=logdir, filename_prefix='quant_model')", I can get 4 files: quant_model.encodings, quant_model_torch.encodings and quant_model.onnx, quant_model.pth

  1. I read issue How to get a real int8 quanted ONNX model? #2816 and got that the workflow is not supported to export real int8 quant value yet.

  2. But why does it have 2 encodings files?

I figure out that the "param_encodings" in 2 files is same but "activation_encodings" looks so different.

here is for "quant_model.encodings" file:
Image

and here is for "quant_model_torch.encodings" file:
Image

  1. Can I get the real int8 value by calculating via (max, min, offset, scale)?

Thank you.

@quic-klhsieh
Copy link
Contributor

Hi @QuangDucc , answers to your questions below:

  1. The two encodings files exported during quantsim export serve different purposes.
  • File ending in '_torch.encodings':

The file ending with '_torch.encodings' maps layers in the Pytorch quantsim model to activation and parameter encodings. For activation encodings, the top level dictionary contains layer names as keys. There are further nested dictionaries corresponding to inputs or outputs of the layers, and finally indices of inputs or outputs, ultimately mapping to the encodings corresponding to a particular layer's input or output tensor. Parameter encodings will map torch praameter names to corresponding encodings.

This file is mainly used for saving encodings to later be loaded into another quantsim object quantizing the same model. This is useful in case you have calibrated the model in the past and want to load encodings into a new quantsim without going through calibration once again for example.

  • File ending in '.encodings' without '_torch':

This file is used in conjunction with the exported .onnx file to be taken onto target by going through our QAIRT stack. The names in this encodings file will correspond to tensor names in the exported .onnx graph.

If you compare the two encodings files, you will find the same encodings show up in both files but mapped under different names depending on whether we are mapping it to Pytorch layer inputs/outputs or the equivalent onnx tensors.

  1. Given encodings for a floating-point tensor, you can compute the quantized int8 tensor by performing

quantized_tensor = round(clamp(fp_tensor, min, max) / scale) - offset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants