You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After calling "sim.export(path=logdir, filename_prefix='quant_model')", I can get 4 files: quant_model.encodings, quant_model_torch.encodings and quant_model.onnx, quant_model.pth
The two encodings files exported during quantsim export serve different purposes.
File ending in '_torch.encodings':
The file ending with '_torch.encodings' maps layers in the Pytorch quantsim model to activation and parameter encodings. For activation encodings, the top level dictionary contains layer names as keys. There are further nested dictionaries corresponding to inputs or outputs of the layers, and finally indices of inputs or outputs, ultimately mapping to the encodings corresponding to a particular layer's input or output tensor. Parameter encodings will map torch praameter names to corresponding encodings.
This file is mainly used for saving encodings to later be loaded into another quantsim object quantizing the same model. This is useful in case you have calibrated the model in the past and want to load encodings into a new quantsim without going through calibration once again for example.
File ending in '.encodings' without '_torch':
This file is used in conjunction with the exported .onnx file to be taken onto target by going through our QAIRT stack. The names in this encodings file will correspond to tensor names in the exported .onnx graph.
If you compare the two encodings files, you will find the same encodings show up in both files but mapped under different names depending on whether we are mapping it to Pytorch layer inputs/outputs or the equivalent onnx tensors.
Given encodings for a floating-point tensor, you can compute the quantized int8 tensor by performing
Hi,
After calling "sim.export(path=logdir, filename_prefix='quant_model')", I can get 4 files: quant_model.encodings, quant_model_torch.encodings and quant_model.onnx, quant_model.pth
I read issue How to get a real int8 quanted ONNX model? #2816 and got that the workflow is not supported to export real int8 quant value yet.
But why does it have 2 encodings files?
I figure out that the "param_encodings" in 2 files is same but "activation_encodings" looks so different.
here is for "quant_model.encodings" file:

and here is for "quant_model_torch.encodings" file:

Thank you.
The text was updated successfully, but these errors were encountered: