add support for QuantizedCache #5

SimJeg · 2024-11-21T12:00:16Z

Transformers support KV cache quantization through the QuantizedCache class (see their blog post). I propose to update BasePress and pipeline.py to support it.

Note that it implies to add several installations I did not include in pyproject.toml following their philosophy of not install additional kernels. I noticed issues during installation as mentioned here.

maxjeblick

lgtm thanks a lot!

add support for QuantizedCache

90ca972

SimJeg assigned maxjeblick Nov 21, 2024

SimJeg added 3 commits November 21, 2024 13:59

update README

58ed978

Merge branch 'main' into kv-cache-quantization

cc7ff3b

upgrade version

bf6cda2

maxjeblick approved these changes Nov 21, 2024

View reviewed changes

update README

b62ebb3

maxjeblick approved these changes Nov 21, 2024

View reviewed changes

SimJeg merged commit 64b3c17 into main Nov 21, 2024
2 checks passed

SimJeg deleted the kv-cache-quantization branch November 21, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for QuantizedCache #5

add support for QuantizedCache #5

SimJeg commented Nov 21, 2024

maxjeblick left a comment

add support for QuantizedCache #5

add support for QuantizedCache #5

Conversation

SimJeg commented Nov 21, 2024

maxjeblick left a comment

Choose a reason for hiding this comment