🌰 SEED Multimodal

Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.

The repository provides the official implementation of SEED, SEED-LLaMA. For any inquiries, please email [email protected].

News

🍻 We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. 🍻

2023-11-03 🤗 We have released the demo of seed-llama-v2-1, and the quality of generated images has been greatly improved, feel free to use it by yourself.
2023-10-23 🤗 We have optimized the memory overhead. Through 8bit quantization and dynamic loading, SEED-LLaMA 8b/14B can run on single 16GB/24GB GPU.
2023-10-23 🤗 All model weights will be downloaded automatically when starting the demo.
2023-10-20 🤗 We release the checkpoints and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B.
2023-10-20 👾 We release an online gradio demo, feel free to use it by yourself.
2023-10-02 📎 We release the technical report of SEED-LLaMA on arXiv, which is empowered by the improved SEED-2 tokenizer.
2023-07-29 We release the checkpoint of the SEED tokenizer and its inference code. Check it out via SEED-1.
2023-07-16 📎 We release the technical report of SEED on arXiv.

Stay tuned for the updates!

Brief Introduction

It is recommended to check out our papers for technical details.

💬 What can SEED-LLaMA do?

SEED-LLaMA is capable of both multimodal comprehension and generation, exhibiting compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant. [Compare to SOTA] [More examples on X]

💡 How does SEED-LLaMA achieve it?

The core of SEED-LLaMA is the tailored SEED tokenizer, which properly quantized visual signals into discrete visual tokens, capturing necessary semantics while being produced under 1D causal dependence. [SEED-2 vs. SEED-1]

Usage

Dependencies

Python >= 3.8 (Recommend to use Anaconda)
PyTorch >= 1.11.0
NVIDIA GPU + CUDA

Installation

Clone the repo and install dependent packages

git clone https://github.com/AILab-CVC/SEED.git
cd SEED
pip install -r requirements.txt

Model Weights

We release the pretrained SEED Tokenizer and De-Tokenizer, pretrained and instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B in SEED Hugging Face.

Check the SEED tokenizer weights in AILab-CVC/seed-tokenizer-2
Check the SEED LLaMA(8B) weights in AILab-CVC/seed-llama-8b-sft
Check the SEED LLaMA(14B) weights in AILab-CVC/seed-llama-14b-sft

The model weights of unCLIP SD-UNet which are used to reconstruct the image will be downloaded automatically.

Inference for visual tokenization and de-tokenization

To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet:

cd ..   # SEED/ 
python scripts/seed_tokenizer_inference.py

Inference for SEED-LLaMA

Given that SEED-LLaMA-8B is based on Vicuna-7B and SEED-LLaMA-14B based on LLaMA2-Chat-13B, we use Vicuna-7B's ("USER:", "ASSISTANT:") and LLaMA2-Chat-13B's ([INST] [/INST]) prompts for respective instruction tuning.

# Inference for SEED-LLaMA-8B
python scripts/seed_llama_inference_8B.py

# Inference for SEED-LLaMA-14B
python scripts/seed_llama_inference_14B.py

Launching Gradio Demo of SEED-LLaMA-14B Locally

Building the local demo of SEED-LLaMA-14B currently requires single 24GB GPU.

# SEED/
# in first terminal
bash scripts/start_backend_14b.sh
# in second terminal
bash scripts/start_frontend_14b.sh

Building the local demo of SEED-LLaMA-8B currently requires single 16GB GPU.

# SEED/
# in first terminal
bash scripts/start_backend_8b.sh
# in second terminal
bash scripts/start_frontend_8b.sh

Then the demo can be accessed through http://127.0.0.1:80

Citation

If you find the work helpful, please consider citing:

@article{ge2023making,
  title={Making LLaMA SEE and Draw with SEED Tokenizer},
  author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2310.01218},
  year={2023}
}

@article{ge2023planting,
  title={Planting a seed of vision in large language model},
  author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2307.08041},
  year={2023}
}

The project is still in progress.

License

SEED is released under Apache License Version 2.0.

SEED-LLaMA is released under the original License of LLaMA2.

Acknowledgement

We thank the great work from unCLIP SD and BLIP2.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.idea		.idea
configs		configs
gradio_demo		gradio_demo
images		images
models		models
paper_images		paper_images
scripts		scripts
.gitignore		.gitignore
.project-root		.project-root
License.txt		License.txt
README.md		README.md
SEED-1.md		SEED-1.md
cat.jpg		cat.jpg
coco		coco
eval_cider.py		eval_cider.py
pretrained		pretrained
requirements.txt		requirements.txt
results copy.json		results copy.json
results.json		results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌰 SEED Multimodal

News

Brief Introduction

💬 What can SEED-LLaMA do?

💡 How does SEED-LLaMA achieve it?

Usage

Dependencies

Installation

Model Weights

Inference for visual tokenization and de-tokenization

Inference for SEED-LLaMA

Launching Gradio Demo of SEED-LLaMA-14B Locally

Citation

License

Acknowledgement

About

Releases

Packages

Languages

License

KU-AGI/SEED

Folders and files

Latest commit

History

Repository files navigation

🌰 SEED Multimodal

News

Brief Introduction

💬 What can SEED-LLaMA do?

💡 How does SEED-LLaMA achieve it?

Usage

Dependencies

Installation

Model Weights

Inference for visual tokenization and de-tokenization

Inference for SEED-LLaMA

Launching Gradio Demo of SEED-LLaMA-14B Locally

Citation

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages