Activity
Fixed the fused QKV projection causing excess memory usage by the KV …
Fixed the fused QKV projection causing excess memory usage by the KV …
Disable cache in ppl.py. More benchmarks in README. Fixed bug in gene…
Disable cache in ppl.py. More benchmarks in README. Fixed bug in gene…
Updated to support latest transformers. Added a quantize.py script.
Updated to support latest transformers. Added a quantize.py script.
Add requirements warning to README.md
Add requirements warning to README.md
More improvements to the tuning of the Triton kernel. Fused the QKV c…
More improvements to the tuning of the Triton kernel. Fused the QKV c…
Improved tuning of the Triton kernel, giving a nice boost in performa…
Improved tuning of the Triton kernel, giving a nice boost in performa…
Triton kernel now unpacks zeros itself. Performance of Triton kernel …
Triton kernel now unpacks zeros itself. Performance of Triton kernel …
Merge branch 'main' of github-gptq-triton:fpgaminer/GPTQ-triton
Merge branch 'main' of github-gptq-triton:fpgaminer/GPTQ-triton