We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am attempting to perform W8A8 quantization using the int8FusedDequantizeCUDA operator, but the inference results are NaN. The code is as follows:
Modifications in qlinear.py:
qint_x = shared_input.qint_x # qint_x shape: [M , K] int_weight = self.int_weight # int_weight shape: [N, K] scale_row = shared_input.meta[None, 0::2].contiguous() # scale_row shape: [1, M] zero_row = shared_input.meta[None, 1::2].contiguous() # zero_row shape: [1, M] weights_scales = self.weights_scales.transpose(0, 1) # weights_scales: [1, N] reduced_w = self.reduced_w # reduced_w: [1, N] shift_value = 128.0 output = quik.asymmetric.int8FusedDequantize( qint_x, int_weight, scale_row, weights_scales, shift_value, zero_row, reduced_w, fp_result)
Is there an issue with the operator itself or am I using it incorrectly? Could you please provide some suggestions? Thank you very much.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I am attempting to perform W8A8 quantization using the int8FusedDequantizeCUDA operator, but the inference results are NaN. The code is as follows:
Modifications in qlinear.py:
Is there an issue with the operator itself or am I using it incorrectly? Could you please provide some suggestions? Thank you very much.
The text was updated successfully, but these errors were encountered: