Welcome to PDF-RAG, a simple pipeline that lets you upload and interact with your PDFs. This repository provides an easy-to-use framework for building a conversational interface for document interaction.
-
Set up your environment:
python3 -m venv my_new_env source my_new_env/bin/activate
-
Follow the notebook: Open
rag_demo.ipynb
for step-by-step instructions. We’ve also included a test set of questions, answers, and source documents in cbo_questions.xlsx to evaluate the pipeline. Sample documents from the Congressional Budget Office are included in the cbo_documents folder for testing. To make this work, please go to "keys.txt" file and paste your Hugging Face token to access the models.
- Processing PDFs: PDFs are chunked into manageable pieces using LangChain.
- Embeddings:
- We tested FinLang (for financial documents) and sentence-t5-base (for general use).
- Embeddings are managed using Faiss, which is optimized for fast similarity searches.
- Generation: Llama-2-7b-chat for the conversational interface. We load quantized version of the model via bitsandbytes so a decent GPU should handle it.
Interact with your PDFs using the following code snippets. You can also try out on Google Colab
- Streamlit:
streamlit run streamlit_UI.py
- Gradio:
python gradio_UI.py
The pipeline works great for text-heavy documents but isn’t the best fit for those with complex multi-modal content just yet. Don’t worry—we’re cooking up an update to tackle that with a multi-modal RAG pipeline powered by Vision LLMs. Actually, we did, access MultiModel-RAG-ColPali-Qdrant-Qwen for all the materials. We’d love to hear your feedback or questions. Cheers!
Erdi: [email protected] Furkan: [email protected]