Adding cached KVs #266

gkroiz · 2023-05-13T01:14:35Z

This PR adds the option to cache KVs for inference.

Cached KVs portion of splitting #240 into 2 PRs

main changes:

added KV caching
moved rope cache to the whole model from each attention layer
added mask cache to use for attention masking with cached kvs.
fix ops and tensors that would cause recompilation.

Similar to Lightning-AI/litgpt#51
Fixes #197
Fixes #309

TODO: Update tests

gkroiz · 2023-05-13T01:14:43Z

TODO: update tests

lantiga · 2023-05-16T12:12:25Z

For visibility: updates to lit-parrot caching
Lightning-AI/litgpt#60

carmocca

Implementation looks good!

generate.py

gkroiz · 2023-05-22T20:05:20Z

@carmocca what is the difference between generate.py and generate/full.py?

carmocca · 2023-05-22T20:11:11Z

@gkroiz It's basically a duplicate. See the description here #255

carmocca · 2023-05-22T20:14:46Z

I opened #313 to track it. I wouldn't try to merge them here.

generate/full.py

carmocca

I think we're done here! The howto update can be done separately

lantiga

Awesome job!

Added cached KVs

f75e80d

TODO: Update tests

gkroiz requested review from awaelchli, carmocca and lantiga as code owners May 13, 2023 01:14

carmocca mentioned this pull request May 22, 2023

when will kv caching landing? #309

Closed

carmocca reviewed May 22, 2023

View reviewed changes

generate.py Show resolved Hide resolved

carmocca added 9 commits May 22, 2023 11:16

Merge branch 'main' into cached_kvs

97abd7b

Types

976d1d5

This was removed in lit-parrot

734d131

Fixing tests

44c3289

Update adapter

629b71d

Formatting

3c88506

Fix rope test

296497e

Changes from Lightning-AI/litgpt@e09ad1d

73b2aab

Generate args

48c8a28

carmocca and others added 4 commits May 22, 2023 22:28

Merge branch 'main' into cached_kvs

6a589f7

adapter v2

0ba4204

update generate.py in generate

30a6ef1

full.py reuses generate as the other scripts

0fe1b89

carmocca reviewed May 22, 2023

View reviewed changes

generate/full.py Outdated Show resolved Hide resolved

gkroiz and others added 2 commits May 22, 2023 21:27

reverting checkpoint paths

4368b82

Fixing generation scripts

6b60cfd

carmocca approved these changes May 22, 2023

View reviewed changes

carmocca self-assigned this May 22, 2023

carmocca added the enhancement New feature or request label May 22, 2023

carmocca added the inference label May 22, 2023

lantiga approved these changes May 22, 2023

View reviewed changes

lantiga merged commit a24fc5e into Lightning-AI:main May 22, 2023

gkroiz deleted the cached_kvs branch May 23, 2023 02:52

gkroiz mentioned this pull request May 23, 2023

tpu howto timing adjustments #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding cached KVs #266

Adding cached KVs #266

gkroiz commented May 13, 2023 •

edited by carmocca

Loading

gkroiz commented May 13, 2023

lantiga commented May 16, 2023

carmocca left a comment

gkroiz commented May 22, 2023

carmocca commented May 22, 2023

carmocca commented May 22, 2023

carmocca left a comment

lantiga left a comment

Adding cached KVs #266

Adding cached KVs #266

Conversation

gkroiz commented May 13, 2023 • edited by carmocca Loading

gkroiz commented May 13, 2023

lantiga commented May 16, 2023

carmocca left a comment

Choose a reason for hiding this comment

gkroiz commented May 22, 2023

carmocca commented May 22, 2023

carmocca commented May 22, 2023

carmocca left a comment

Choose a reason for hiding this comment

lantiga left a comment

Choose a reason for hiding this comment

gkroiz commented May 13, 2023 •

edited by carmocca

Loading