ragadoc

An AI document assistant that answers questions about your PDFs with citations and highlights them directly in the document.

Ragadoc is a privacy-first Streamlit application that lets you chat with your documents using locally-run AI models. Ask questions, get grounded answers with citations, and see exactly where the information comes from with automatic PDF highlighting.

✨ Key Features

🤖 AI Document Q&A - Ask natural language questions about your PDFs
📍 Citation Grounding - Every answer includes specific citations from your document
🎯 PDF Highlighting - Citations are automatically highlighted in the original PDF
🔒 Complete Privacy - Uses only local AI models, your documents never leave your computer
⚡ Fast Processing - Optimized document parsing and retrieval system
🌐 Easy Web Interface - Simple Streamlit app, no technical knowledge required

Main chat interface with document upload and conversation

Document analysis with citations and highlighted responses

⚠️ Warning: Proof of Concept, Early Development

This application is currently in early development and should be considered a proof of concept. Features may be incomplete, unstable, or subject to significant changes. Use at your own discretion and expect potential bugs or breaking changes in future updates.

🚀 Quick Start

Model Selection Guide

Choose models based on your system capabilities:

Model Type	Model Name	Size	RAM Required	Use Case
Embedding	`nomic-embed-text`	~274MB	1GB	Recommended - General purpose
Embedding	`all-minilm`	~23MB	512MB	Lightweight alternative
Chat	`qwen3:14b`	~8.5GB	16GB	Recommended
Chat	`llama3.1:8b`	~4.7GB	8GB	Balanced option
Chat	`mistral:latest`	~4.1GB	8GB	Quick responses
Chat	`phi3:mini`	~2.3GB	4GB	Low-resource systems

Prerequisites (Required for Both Installation Methods)

1. Install Ollama (for local AI models):

# macOS
brew install ollama

# Or download from https://ollama.com

2. Start Ollama and install required models:

ollama serve

# Install embedding model (required)
ollama pull nomic-embed-text

# Install a chat model (see recommendations above)
ollama pull qwen3:14b

Installation Options

Choose your preferred installation method:

Option 1: Direct Installation

Additional Prerequisites:

Python 3.8+

Installation Steps:

Clone the repository:

git clone https://github.com/yourusername/ragadoc.git
cd ragadoc

Install Python dependencies:

pip install -r requirements.txt

Or with conda:

conda env create -f environment.yml
conda activate ragadoc

Launch the application:
```
streamlit run app.py
```
Open your browser to http://localhost:8501

Option 2: Docker Installation

Additional Prerequisites:

Docker and Docker Compose

Installation Steps:

Clone the repository:

git clone https://github.com/yourusername/ragadoc.git
cd ragadoc

Start with Docker Compose:
```
docker-compose up
```
Open your browser to http://localhost:8501

📖 How to Use

Upload a PDF - Drag and drop or browse for your document
Select AI Model - Choose from your locally installed Ollama models
Start Chatting - Ask questions about your document in natural language
View Citations - See highlighted text in the PDF that supports each answer
Explore - Continue the conversation to dive deeper into your document

🏗️ Architecture

Ragadoc uses a modern RAG (Retrieval-Augmented Generation) architecture:

PDF Upload → Text Extraction → Chunking → Vector Embeddings
                                               ↓
User Question → Semantic Search → Context Retrieval → AI Response
                                               ↓
                                    Citation Highlighting

Tech Stack:

Frontend: Streamlit web interface
AI Models: Ollama (local LLMs)
Vector DB: ChromaDB for semantic search
PDF Processing: PyMuPDF4LLM for structure-aware extraction
Embeddings: nomic-embed-text model

🐛 Troubleshooting

Common Issues

Ollama Connection Error

# Verify Ollama is running
curl http://localhost:11434/api/version

# If using Docker, ensure external access
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Slow Performance

Try next smaller model
Reduce chunk size in expert RAG settings
Ensure sufficient RAM is available

📄 License

This project is licensed under the GPL License - see the LICENSE file for details.

🙏 Acknowledgments

Ollama for making local AI accessible
Streamlit for the amazing web framework
PyMuPDF for PDF processing
ChromaDB for vector storage

⭐ Star this repo if Ragadoc helps you work with your documents more effectively!

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.streamlit		.streamlit
assets		assets
ragadoc		ragadoc
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
DEVELOPMENT_INSTRUCTIONS.md		DEVELOPMENT_INSTRUCTIONS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
docker.env.example		docker.env.example
environment.yml		environment.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ragadoc

✨ Key Features

🚀 Quick Start

Model Selection Guide

Prerequisites (Required for Both Installation Methods)

Installation Options

Option 1: Direct Installation

Option 2: Docker Installation

📖 How to Use

🏗️ Architecture

🐛 Troubleshooting

Common Issues

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

clstaudt/ragadoc

Folders and files

Latest commit

History

Repository files navigation

ragadoc

✨ Key Features

🚀 Quick Start

Model Selection Guide

Prerequisites (Required for Both Installation Methods)

Installation Options

Option 1: Direct Installation

Option 2: Docker Installation

📖 How to Use

🏗️ Architecture

🐛 Troubleshooting

Common Issues

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages