Skip to content

Rcidshacker/local-clara-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Ultimate CLaRa Agent

A Local Neuro-Symbolic RAG System with HyDE, Knowledge Graphs & Web Search

Python Ollama LangGraph License

Beyond Standard RAG β€” A system that thinks before it searches, understands relationships, and self-corrects.

Features β€’ Architecture β€’ Installation β€’ Usage β€’ How It Works


🌟 The Vision

Inspired by Apple's CLaRa (Continuous Latent Reasoning) research, this project proves you don't need a massive GPU cluster to build state-of-the-art AI. By combining Symbolic AI (Knowledge Graphs) with Neural AI (LLMs) and orchestrating them with LangGraph, we built a system that outperforms standard RAG pipelinesβ€”running entirely on a standard laptop.


πŸ–₯️ Interface Preview

CLaRa Agent Interface

The glassmorphism UI with chat and real-time reasoning panel


✨ Features

Feature Description
πŸ” HyDE Search Hypothetical Document Embeddings β€” the AI "imagines" an answer first to find better matches
πŸ•ΈοΈ Graph Memory NetworkX-powered knowledge graph that understands entity relationships
πŸ’­ Contextual Memory Remembers conversation history and rewrites follow-up questions
🌐 Web Fallback Automatically searches the internet when local knowledge is insufficient
⚑ 100% Local Runs on Ollama with Llama 3.1 β€” no API keys, no cloud, complete privacy
🎨 Pro UI Glassmorphism design with real-time "thought process" visualization

πŸ—οΈ Architecture

graph TD
    subgraph "🧠 Agent Brain"
        User[πŸ‘€ User Query] --> Context[πŸ”„ Contextualize Node]
        Context -->|Rewritten Query| Retrieve[πŸ“š Hybrid Retrieval]
        
        subgraph "Dual Memory System"
            Retrieve --> Vector[(πŸ”· ChromaDB<br/>Vector Store)]
            Retrieve --> Graph[(πŸ•ΈοΈ NetworkX<br/>Knowledge Graph)]
        end
        
        Vector --> Merge[Merge Results]
        Graph --> Merge
        
        Merge --> Grade{βš–οΈ Is Context<br/>Relevant?}
        
        Grade -->|βœ… Yes| Generate[✍️ Generate Answer]
        Grade -->|❌ No| Web[🌐 Web Search]
        
        Web --> Generate
        Generate --> Response[πŸ’¬ Final Response]
    end
    
    style User fill:#3b82f6,color:#fff
    style Response fill:#10b981,color:#fff
    style Web fill:#a855f7,color:#fff
Loading

πŸ› οΈ Tech Stack

Component Technology Purpose
LLM Engine Ollama + Llama 3.1 Local inference, privacy-first
Prompt Optimizer DSPy Compilable signatures, no brittle prompts
Orchestrator LangGraph Stateful agent loops with decision branches
Vector Memory ChromaDB Semantic similarity search
Graph Memory NetworkX Entity relationship storage
Web Search DuckDuckGo Real-time internet fallback
Interface FastAPI + Gradio Professional glassmorphism UI

πŸ“¦ Installation

Prerequisites

  • Python 3.11+
  • Ollama with Llama 3.1 model installed
  • Git

Step 1: Clone the Repository

git clone https://github.com/Rcidshacker/local-clara-agent.git
cd local-clara-agent

Step 2: Create Virtual Environment

python -m venv venv

# Windows
.\venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install & Start Ollama

# Install Ollama from https://ollama.ai
ollama pull llama3.1
ollama serve

Step 5: Add Your Data

Place your PDF files in the data/ folder:

ultimate-rag-agent/
└── data/
    └── your-document.pdf

Step 6: Ingest Your Documents

python run_ingest.py

This will:

  • Extract text from your PDFs
  • Create vector embeddings in ChromaDB
  • Build a knowledge graph in NetworkX

πŸš€ Usage

Start the Application

python app.py

Access the UI

Open your browser and navigate to:

http://127.0.0.1:8000

Example Queries

Query Type Example
Factual "What is a Qubit?"
Code "Show me Python code to create a quantum circuit"
Relationship "Who is the author of the book?"
Follow-up "Explain that code in detail"
Live Info "What are the latest quantum computing news from 2024?"

πŸ“ Project Structure

ultimate-rag-agent/
β”œβ”€β”€ πŸ“‚ agent/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── workflows.py       # LangGraph state machine & nodes
β”œβ”€β”€ πŸ“‚ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ settings.py        # LLM & path configuration
β”‚   β”œβ”€β”€ ingest.py          # PDF β†’ Vector + Graph pipeline
β”‚   └── retrieval.py       # HyDE + Graph hybrid search
β”œβ”€β”€ πŸ“‚ data/
β”‚   └── [your PDFs here]
β”œβ”€β”€ πŸ“‚ storage/
β”‚   β”œβ”€β”€ chroma_db/         # Vector embeddings
β”‚   └── knowledge_graph.gpickle  # Graph data
β”œβ”€β”€ app.py                 # FastAPI + Gradio UI
β”œβ”€β”€ run_ingest.py          # Ingestion runner
└── requirements.txt

πŸ”¬ How It Works

Phase 1: Hybrid Memory (Ingestion)

Goal: Don't just read the PDF β€” understand it.

We created a Dual-Path Ingestion Engine:

  • Path A (Vector): Standard text chunks β†’ ChromaDB for similarity search
  • Path B (Graph): DSPy's EntityExtractor β†’ NetworkX for relationship storage
# Example extraction
Input:  "Santanu Pattanayak is the author of Quantum Machine Learning"
Output: ('Santanu Pattanayak', 'Author', 'Quantum Machine Learning')

Phase 2: HyDE Retrieval (CLaRa-Style)

Goal: Fix the problem where "Explain that code" fails in vector search.

We implemented Hypothetical Document Embeddings:

  1. User asks: "How do I make a qubit?"
  2. Agent hallucinates: "To create a qubit in Python, use cirq.GridQubit..."
  3. Search: The hallucination matches technical docs perfectly!

Phase 3: Contextual Memory

Goal: Understand follow-up questions like "Explain that code."

The QueryRewriter node transforms ambiguous queries:

Before: "Explain that code"
After:  "Explain the QuantumLayer class Python code from the previous response"

Phase 4: Web Fallback ("God Mode")

Goal: Handle questions outside the PDF's knowledge.

if local_relevance < threshold:
    switch_to_web_search()  # DuckDuckGo

This transforms the system from a "Static Librarian" into a "Live Researcher."


🎨 UI Features

The Gradio interface includes:

  • πŸŒ™ Dark Glassmorphism Theme β€” Modern, professional aesthetic
  • πŸ’¬ Chat History β€” Full conversation memory
  • βš™οΈ Reasoning Panel β€” See exactly what the agent retrieved and how it rewrote queries
  • 🏷️ Source Badges β€” Clear indication of Local vs Web knowledge

πŸ§ͺ Key Innovations

Innovation Standard RAG CLaRa Agent
Search Method Keyword matching HyDE (Hypothetical Documents)
Memory Vector only Vector + Knowledge Graph
Context Stateless Stateful with query rewriting
Failure Mode Hallucinate Fall back to web search
Transparency Black box Full reasoning visualization

πŸ“Š Performance

Tested on a standard laptop (no GPU required):

  • Ingestion Speed: ~50 pages/minute
  • Query Latency: 3-8 seconds (depends on Ollama model size)
  • Memory Usage: ~2GB RAM
  • Storage: ~100MB per 1000 pages

🀝 Contributing

Contributions are welcome! Here are some ideas:

  • Add support for more document types (Word, HTML)
  • Implement streaming responses
  • Add conversation export feature
  • Create Docker deployment
  • Add more sophisticated graph reasoning

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Apple Research β€” For the CLaRa paper inspiration
  • DSPy Team β€” For revolutionizing prompt engineering
  • LangChain/LangGraph β€” For the agentic framework
  • Ollama β€” For making local LLMs accessible

Built with πŸ’™ using DSPy, LangGraph, ChromaDB & NetworkX

⬆ Back to Top

About

A local Neuro-Symbolic RAG agent inspired by Apple's CLaRa. Built with the "2026 Stack" (DSPy, LangGraph, Ollama), it features Hybrid Memory (Vector + Knowledge Graph), HyDE retrieval, and self-correcting web search fallback.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages