Beyond Standard RAG β A system that thinks before it searches, understands relationships, and self-corrects.
Features β’ Architecture β’ Installation β’ Usage β’ How It Works
Inspired by Apple's CLaRa (Continuous Latent Reasoning) research, this project proves you don't need a massive GPU cluster to build state-of-the-art AI. By combining Symbolic AI (Knowledge Graphs) with Neural AI (LLMs) and orchestrating them with LangGraph, we built a system that outperforms standard RAG pipelinesβrunning entirely on a standard laptop.
| Feature | Description |
|---|---|
| π HyDE Search | Hypothetical Document Embeddings β the AI "imagines" an answer first to find better matches |
| πΈοΈ Graph Memory | NetworkX-powered knowledge graph that understands entity relationships |
| π Contextual Memory | Remembers conversation history and rewrites follow-up questions |
| π Web Fallback | Automatically searches the internet when local knowledge is insufficient |
| β‘ 100% Local | Runs on Ollama with Llama 3.1 β no API keys, no cloud, complete privacy |
| π¨ Pro UI | Glassmorphism design with real-time "thought process" visualization |
graph TD
subgraph "π§ Agent Brain"
User[π€ User Query] --> Context[π Contextualize Node]
Context -->|Rewritten Query| Retrieve[π Hybrid Retrieval]
subgraph "Dual Memory System"
Retrieve --> Vector[(π· ChromaDB<br/>Vector Store)]
Retrieve --> Graph[(πΈοΈ NetworkX<br/>Knowledge Graph)]
end
Vector --> Merge[Merge Results]
Graph --> Merge
Merge --> Grade{βοΈ Is Context<br/>Relevant?}
Grade -->|β
Yes| Generate[βοΈ Generate Answer]
Grade -->|β No| Web[π Web Search]
Web --> Generate
Generate --> Response[π¬ Final Response]
end
style User fill:#3b82f6,color:#fff
style Response fill:#10b981,color:#fff
style Web fill:#a855f7,color:#fff
| Component | Technology | Purpose |
|---|---|---|
| LLM Engine | Ollama + Llama 3.1 | Local inference, privacy-first |
| Prompt Optimizer | DSPy | Compilable signatures, no brittle prompts |
| Orchestrator | LangGraph | Stateful agent loops with decision branches |
| Vector Memory | ChromaDB | Semantic similarity search |
| Graph Memory | NetworkX | Entity relationship storage |
| Web Search | DuckDuckGo | Real-time internet fallback |
| Interface | FastAPI + Gradio | Professional glassmorphism UI |
- Python 3.11+
- Ollama with Llama 3.1 model installed
- Git
git clone https://github.com/Rcidshacker/local-clara-agent.git
cd local-clara-agentpython -m venv venv
# Windows
.\venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txt# Install Ollama from https://ollama.ai
ollama pull llama3.1
ollama servePlace your PDF files in the data/ folder:
ultimate-rag-agent/
βββ data/
βββ your-document.pdf
python run_ingest.pyThis will:
- Extract text from your PDFs
- Create vector embeddings in ChromaDB
- Build a knowledge graph in NetworkX
python app.pyOpen your browser and navigate to:
http://127.0.0.1:8000
| Query Type | Example |
|---|---|
| Factual | "What is a Qubit?" |
| Code | "Show me Python code to create a quantum circuit" |
| Relationship | "Who is the author of the book?" |
| Follow-up | "Explain that code in detail" |
| Live Info | "What are the latest quantum computing news from 2024?" |
ultimate-rag-agent/
βββ π agent/
β βββ __init__.py
β βββ workflows.py # LangGraph state machine & nodes
βββ π core/
β βββ __init__.py
β βββ settings.py # LLM & path configuration
β βββ ingest.py # PDF β Vector + Graph pipeline
β βββ retrieval.py # HyDE + Graph hybrid search
βββ π data/
β βββ [your PDFs here]
βββ π storage/
β βββ chroma_db/ # Vector embeddings
β βββ knowledge_graph.gpickle # Graph data
βββ app.py # FastAPI + Gradio UI
βββ run_ingest.py # Ingestion runner
βββ requirements.txt
Goal: Don't just read the PDF β understand it.
We created a Dual-Path Ingestion Engine:
- Path A (Vector): Standard text chunks β ChromaDB for similarity search
- Path B (Graph): DSPy's
EntityExtractorβ NetworkX for relationship storage
# Example extraction
Input: "Santanu Pattanayak is the author of Quantum Machine Learning"
Output: ('Santanu Pattanayak', 'Author', 'Quantum Machine Learning')Goal: Fix the problem where "Explain that code" fails in vector search.
We implemented Hypothetical Document Embeddings:
- User asks: "How do I make a qubit?"
- Agent hallucinates: "To create a qubit in Python, use
cirq.GridQubit..." - Search: The hallucination matches technical docs perfectly!
Goal: Understand follow-up questions like "Explain that code."
The QueryRewriter node transforms ambiguous queries:
Before: "Explain that code"
After: "Explain the QuantumLayer class Python code from the previous response"
Goal: Handle questions outside the PDF's knowledge.
if local_relevance < threshold:
switch_to_web_search() # DuckDuckGoThis transforms the system from a "Static Librarian" into a "Live Researcher."
The Gradio interface includes:
- π Dark Glassmorphism Theme β Modern, professional aesthetic
- π¬ Chat History β Full conversation memory
- βοΈ Reasoning Panel β See exactly what the agent retrieved and how it rewrote queries
- π·οΈ Source Badges β Clear indication of Local vs Web knowledge
| Innovation | Standard RAG | CLaRa Agent |
|---|---|---|
| Search Method | Keyword matching | HyDE (Hypothetical Documents) |
| Memory | Vector only | Vector + Knowledge Graph |
| Context | Stateless | Stateful with query rewriting |
| Failure Mode | Hallucinate | Fall back to web search |
| Transparency | Black box | Full reasoning visualization |
Tested on a standard laptop (no GPU required):
- Ingestion Speed: ~50 pages/minute
- Query Latency: 3-8 seconds (depends on Ollama model size)
- Memory Usage: ~2GB RAM
- Storage: ~100MB per 1000 pages
Contributions are welcome! Here are some ideas:
- Add support for more document types (Word, HTML)
- Implement streaming responses
- Add conversation export feature
- Create Docker deployment
- Add more sophisticated graph reasoning
This project is licensed under the MIT License - see the LICENSE file for details.
- Apple Research β For the CLaRa paper inspiration
- DSPy Team β For revolutionizing prompt engineering
- LangChain/LangGraph β For the agentic framework
- Ollama β For making local LLMs accessible
Built with π using DSPy, LangGraph, ChromaDB & NetworkX
