-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should I verify the speedup effect of the algorithm? #15
Comments
Hi, SparseGPT itself is just concerned with accurately sparsifying a model; acceleration comes through other software / hardware that is able to exploit sparse models through speedup (such as 2:4 sparsity on Ampere GPUs). Our layer-wise 2:4 speedup measurements where produced directly with the prebuilt kernels available in NVIDIA's CUTLASS profiler. We compiled all the available kernels and then ran a benchmark sweep using this profiler (on an A100 GPU) for FP16/FP16 SpGEMMs of the appropriate matrix shapes. The result of this are the numbers we report. Observing those speedups during full inference will require integrating the corresponding CUTLASS kernels into PyTorch (Though, I think PyTorch is actually working on an official NVIDIA 2:4 integration, so hopefully actually running 2:4 models will be quite easy very soon.) |
Thank you for your kind reply~ |
@efrantar Hi, following your introducement, I prepare an environment for NVIDIA's CUTLASS profiler and compiled kernels with official guide. As for "Observing those speedups during full inference will require integrating the corresponding CUTLASS kernels into PyTorch" mentioned above, I'm confused about how to make it work. Would that be convenient for you to offer some code for speedup testing? Or some links to NVIDIA related demo would be fine too. Thanks again |
Hi, I'm someone who wants to validate the speedup of 2:4 sparsification and density models.
I look forward to hearing from you. Thank you. |
As shown in paper, CUTLASS library is used for speedup. But I did not find codes rely on these settlement.How should I verify SparseGPT is faster than dense models when doing inference? Even with end-to-end, speedups would be slightly lower, that would be fine. Thanks a lot for your perfect works~
The text was updated successfully, but these errors were encountered: