GitHub - ahsannawazch/Multimodal-RAG: A Chainlit-powered multimodal PDF chatbot utilizing the ColQwen2-v1.0 visual retriever and Qwen2-VL-2B-Instruct model for efficient document retrieval and question answering.

🚀 Multimodal RAG App 🚀

Introducing the Multimodal RAG App—your ultimate solution for extracting and understanding information from complex PDF documents containing images, charts, tables, and graphs! 📄🔍

This application leverages ColQwen2-v1.0, a state-of-the-art visual retriever based on Qwen2-VL-2B-Instruct with ColBERT strategy. ColQwen2 processes entire document pages as images, generating ColBERT-style multi-vector representations that capture both textual and visual cues, preserving each page's structure and context.

To streamline interactions with ColQwen2, we utilize the Byaldi library designed to facilitate the use of late-interaction multi-modal models like ColQwen with a familiar API, thereby enhancing the efficiency of document retrieval tasks.

📑 PDF Processing

Interactive PDF document upload in chainlit app.
Automatic PDF indexing and caching.
Visual and textual context understanding.
Display the relevant pages along with the answer.

🚀 Multi-GPU Support

Automatically detects and utilizes multiple GPUs for optimal performance.
Optimally distributes Visual Retriever (ColQwen2) and VL models across available GPUs.
Falls back to single GPU when necessary.

⚡ Hardware Acceleration

Automatic Flash Attention support detection for compatible GPUs:
- Flash Attention 2.0 for GPUs with Compute Capability ≥ 8.0 (e.g., A100, H100, L4, A10G, A6000).
- Falls back to SDPA for older GPUs.

📋 Requirements

Before you begin, ensure you have the following installed:

Python 3.10 or higher 🐍
Poppler (used for PDF processing) 📄
A GPU with at least 16 GB of memory (VRAM) 💾

📥 Installing Poppler

For Linux (Ubuntu) 🐧

sudo apt-get install -y poppler-utils

For macOS 🍎

brew install poppler

For Windows 🖥️

Download Poppler for Windows from this source.
Extract the archive and add the bin folder to your system's PATH variable.

🛠️ Installation

Clone the repository:

git clone https://github.com/ahsannawazch/Multimodal-RAG.git
cd Multimodal-RAG

Install the required Python packages:
```
pip install -r requirements.txt
```

🚀 Optional: Faster Model Downloads

If you have high bandwidth and want to download models quickly from Hugging Face, you can enable accelerated downloads:

Set the environment variable:
```
export HF_HUB_ENABLE_HF_TRANSFER=1
```
Install the hf_transfer package:
```
pip install hf_transfer
```

🚀 Usage

Run the app:
```
chainlit run app.py
```
Upload a PDF: When prompted, upload your PDF file to begin indexing it on the disk.
Ask Questions: Once the PDF is uploaded and indexed, you can ask questions about the content, and the app will retrieve and display relevant information, including images and text.

Enjoy exploring your documents with the Multimodal RAG App! 🎉📚

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chainlit.md		chainlit.md
models.py		models.py
multimodal-rag-working.ipynb		multimodal-rag-working.ipynb
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ahsannawazch/Multimodal-RAG

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages