Example using AIMET qunatized model and onnruntime #2880

escorciav · 2024-04-16T09:53:20Z

I'm having issues to verify that a simulated quantized onnx file offers decent performance

Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)

escorciav · 2024-04-16T09:58:25Z

Others have faced similar issues, no?

Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu.

e-said · 2024-04-19T12:52:33Z

Hi @escorciav
I'm using aimet_torch, and there you have a method to convert aimet custom nodes to torch native QDQ nodes.
When I use native QDQ torch nodes and export the onnx model, I'm able to run onnx-runtime on CPU successfully

escorciav · 2024-04-19T14:25:23Z

Thanks for chiming in @e-said !

Do you mind to share a simple Python script with a silly onnx model showcasing that?
Sorry in advance if it's too demanding. Happy to leave a ⭐ in a Github repo or Gist &/Or endorse it via Twitter :)

e-said · 2024-04-19T16:03:49Z

Hi @escorciav
I don't have a simple script showing this (my pipeline is quite complexe) but I can share some hints to help you create a script to test this:

In aimet quantsim.py you have the method to export onnx. If you set use_embedded_encodings to True, the onnx will be generated based on a converted torch model (custom aimet nodes are replaced by native torch nodes)
Once you get this model with embedded QDQ nodes, it should run on onnx-runtime without any issue

PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx

escorciav · 2024-05-09T14:06:36Z

No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion

escorciav closed this as completed May 9, 2024

escorciav reopened this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example using AIMET qunatized model and onnruntime #2880

Example using AIMET qunatized model and onnruntime #2880

escorciav commented Apr 16, 2024

escorciav commented Apr 16, 2024 •

edited

Loading

e-said commented Apr 19, 2024

escorciav commented Apr 19, 2024

e-said commented Apr 19, 2024

escorciav commented May 9, 2024

Example using AIMET qunatized model and onnruntime #2880

Example using AIMET qunatized model and onnruntime #2880

Comments

escorciav commented Apr 16, 2024

escorciav commented Apr 16, 2024 • edited Loading

e-said commented Apr 19, 2024

escorciav commented Apr 19, 2024

e-said commented Apr 19, 2024

escorciav commented May 9, 2024

escorciav commented Apr 16, 2024 •

edited

Loading