-
Notifications
You must be signed in to change notification settings - Fork 403
Example using AIMET qunatized model and onnruntime #2880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Others have faced similar issues, no? Potential thing to test. Why? TOL: it's a library or binary. QNN does something similar to emulate/simulate runtime afaiu. |
Hi @escorciav |
Thanks for chiming in @e-said ! Do you mind to share a simple Python script with a silly onnx model showcasing that? |
Hi @escorciav
PS: please note that your model should contain only int8 QDQ nodes otherwise it won't be converted to onnx |
No worries. I have to do QAT. Thus, I gotta use aimet_torch as per Qualcomm:AIMET dev (maintainers) suggestion |
I'm having issues to verify that a simulated quantized onnx file offers decent performance
Issue: After doing PTQ. I cannot use the quantized model in onnx-runtime! (preferably GPU)
The text was updated successfully, but these errors were encountered: