Installation

Paper | Installation | Eviction | Quantization

We provide three implementations. ThinK_eager contains the code for eager attention, ThinK_flash utilizes FlashAttention and TinK_KIVI which intergrates with KV quantization. Please note that the current implementations may not be fully optimized, and we are actively working on improving their efficiency. We use LongBench to evaluate the performance.

✅ TODO

Support More Models
Support Multi-GPUs
Optimize Efficiency

Installation

Step 1: Clone this repository

Step 2: Setup Environments

conda create -n think python=3.10
conda activate think
pip install -r requirements.txt

Evaluation

Eviction

Evaluate on LongBench: You can first modify the hyperparameters in scripts/scripts_longBench/eval.sh(e.g., pruning_ratio)

cd ThinK_flash
sh ./scripts/scripts_longBench/eval.sh

Results:

sh ./scripts/scripts_longBench/metrics.sh

Quantization

cd ThinK_kivi

Set up the environments as per the instructions from KIVI, adding an additional argument, pruning_ratio. Currently, only LLaMA-2 is supported.

Notes

Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This repository is being released for research purposes only.

Citation

@article{xu2024think,
  title={ThinK: Thinner Key Cache by Query-Driven Pruning},
  author={Xu, Yuhui and Jie, Zhanming and Dong, Hanze and Wang, Lei and Lu, Xudong and Zhou, Aojun and Saha, Amrita and Xiong, Caiming and Sahoo, Doyen},
  journal={arXiv preprint arXiv:2407.21018},
  year={2024}
}

Acknowledgement

This repo builds on the SnapKV, PyramidKV, KIVI repos.

Ethical Considerations

This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ThinK_eager		ThinK_eager
ThinK_flash		ThinK_flash
ThinK_kivi		ThinK_kivi
images		images
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✅ TODO

Installation

Evaluation

Eviction

Quantization

Notes

Citation

Acknowledgement

Ethical Considerations

About

Releases

Packages

Contributors 3

Languages

License

SalesforceAIResearch/ThinK

Folders and files

Latest commit

History

Repository files navigation

✅ TODO

Installation

Evaluation

Eviction

Quantization

Notes

Citation

Acknowledgement

Ethical Considerations

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages