You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 28, 2024. It is now read-only.
i am trying to achieve int8 quantization in wenet base network(conformer) submodule(RelPositionalMHA),
now i have some questions about how to achieve custom int8 quantization tensorrt plugin.
about input, i read the code of faster transformer, tensorrt plugin wenet ,you used invokeQuantization. does it means you change the model and put quantization op in plugin weight, read it during inference init.
about pos_emb, i refered the wenet code to output numpy version relpositional mha inference. while i didn't find cuda code about pos_emb, if it means that pos_emb input is none during inference
about ppq, i used onnxruntime to quantize my submodule model to speed it up,while it slows compare with raw model convert to tensorrt engine. quantized model look like this
looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
i am trying to achieve int8 quantization in wenet base network(conformer) submodule(RelPositionalMHA),
now i have some questions about how to achieve custom int8 quantization tensorrt plugin.
about input, i read the code of faster transformer, tensorrt plugin wenet ,you used invokeQuantization. does it means you change the model and put quantization op in plugin weight, read it during inference init.
about pos_emb, i refered the wenet code to output numpy version relpositional mha inference. while i didn't find cuda code about pos_emb, if it means that pos_emb input is none during inference
about ppq, i used onnxruntime to quantize my submodule model to speed it up,while it slows compare with raw model convert to tensorrt engine. quantized model look like this

looking forward to your reply.
The text was updated successfully, but these errors were encountered: