Skip to content
/ casp Public

[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Notifications You must be signed in to change notification settings

vbdi/casp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akabri, Kevin Cannons, Yong Zhang,

Huawei Technologies Canada

arxiv paper

Image 1 Image 2

Highlights

  • CASP proposes a 2-bit compression method for VLMs that is compatible with any quantization technique and enhances state-of-the-art 2-bit quantization methods (AQLM and QuIP#) by an average of 21% on image- and video-language benchmarks

Installation:

Install the requirements via pip install -r requirements.txt.

Quip#:

  • Build and install the CUDA inference kernels. (cd quip-sharp/quiptools && python setup.py install && cd ../)
  • Install the fast-hadamard-transform package using their github repo.

AQLM:

pip install aqlm[gpu,cpu]

Quantization:

CASPQuIP# :

Follow the below steps to prepare CASPQuIP# for LLaVA-1.5-7B. If you want to quantize LLaVA-1.5-13B or LLaVA-Next you can set the --model in the scripts accordingly. If you want to qunatize LLaMA-7B you should use svd_llama.sh,hfize_llama.sh, and quantize_finetune_llama.sh in the below steps.

  1. To prepare LLaVA-1.5-7B with low-rank compressed Wq and Wk.

    bash SVD/scripts/svd_llava.sh
    
  2. To prepare hessians for QuIP#:

    bash quip-sharp/scripts/hfize_llava.sh 
    
  3. Quantization:

    bash quip-sharp/scripts/quantize_finetune_llava.sh 
    

CASPAQLM :

Follow the below steps to prepare CASPAQLM for LLaVA-1.5-7B. If you want to quantize LLaVA-1.5-13B or LLaVA-Next you can set the --model in the scripts accordingly. If you want to qunatize LLaMA-7B you should use svd_llama.sh and quantize_llama.sh in the below steps.

  1. To prepare llava with low-rank compressed Wq and Wk :

    bash SVD/scripts/svd_llava.sh
    
  2. Quantization:

    bash AQLM/scripts/quantize_llava.sh 
    

CASPGPTQ :

Follow the below steps to prepare CASPGPTQ for LLaVA-1.5-7B. If you want to quantize LLaVA-1.5-13B or LLaVA-Next you can set the --model in the scripts accordingly. If you want to qunatize LLaMA-7B you should use svd_llama.sh and quantize_llama.sh in the below steps.

  1. To prepare llava with low-rank compressed Wq and Wk:

    bash SVD/scripts/svd_llava.sh
    
  2. Quantization:

    bash GPTQ/scripts/quantize_llava.sh
    

📚 Citation

If you find CASP useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@misc{gholami2025caspcompressionlargemultimodal,
      title={CASP: Compression of Large Multimodal Models Based on Attention Sparsity}, 
      author={Mohsen Gholami and Mohammad Akbari and Kevin Cannons and Yong Zhang},
      year={2025},
      eprint={2503.05936},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.05936}, 
}

About

[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages