Skip to content

Commit 2a98fda

Browse files
committed
bitblas Readme
1 parent b16c018 commit 2a98fda

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

Readme.md

+3
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,9 @@ prepare_for_inference(model, backend="torchao_int4")
106106

107107
#Marlin backend: nbits=4, axis=1, compute_dtype=float16, group_size=None
108108
#prepare_for_inference(model, backend="marlin", allow_merge=True)
109+
110+
#Bitblas backend: nbits=4/2/1, axis=1, compute_dtype=float16, group_size=None
111+
#prepare_for_inference(model, backend="bitblas")
109112
```
110113
These backends only work with 4-bit quantization and `axis=1`. Additionally, for <a href="https://github.com/IST-DASLab/marlin.git">Marlin</a>, we only support `group_size=None`. Below you can find a comparison between the different backends. The torchao kernel reaches 195 tokens/sec (generation speed) on a 4090.
111114

0 commit comments

Comments
 (0)