You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Since Cutlass 3.7, mixed input dtype GEMMs are producing less accurate outputs than they were in Cutlass 3.6. The loss of accuracy is substantial and makes using mixed input impractical for real use-cases.
Specifically, we have a collection of mixed input GEMMs in FBGEMM that work well on Cutlass 3.6. While these kernels compile fine with newer versions of cutlass (after small api updates), they produce garbage outputs.
Directly copying example 55's BF16 x I4 Gemm example produces slightly better results, but the outputs are still much less accurate than the 3.6 equivalents.
Steps/Code to reproduce bug
We use this benchmarking script to measure the performance and accuracy of kernels. The script can be run with these sample arguments:
Describe the bug
Since Cutlass 3.7, mixed input dtype GEMMs are producing less accurate outputs than they were in Cutlass 3.6. The loss of accuracy is substantial and makes using mixed input impractical for real use-cases.
Specifically, we have a collection of mixed input GEMMs in FBGEMM that work well on Cutlass 3.6. While these kernels compile fine with newer versions of cutlass (after small api updates), they produce garbage outputs.
Directly copying example 55's BF16 x I4 Gemm example produces slightly better results, but the outputs are still much less accurate than the 3.6 equivalents.
Steps/Code to reproduce bug
We use this benchmarking script to measure the performance and accuracy of kernels. The script can be run with these sample arguments:
This will produce an output like this:
The sim metric is an L1 distance from the BF16 output. After updating to cutlass 3.7, copying example 55, and running the same script we get:
Which has a clearly less correct output. The updated version of the kernel can be found at this PR
Expected behavior
The accuracy of mixed input kernels should not have changed due to updates.
Environment details (please complete the following information):
cuda 12.4 driver version 535.154.05 on Linux system with 8X H100 GPUs.
The text was updated successfully, but these errors were encountered: