Skip to content

Rishiraj-Pathak-27/LLM-Hallucination-Detection-Correction-Using-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 LLM Hallucination Detection & Correction Using RAG

A real-time hallucination detection system that verifies LLM responses against live web sources and auto-corrects using Retrieval-Augmented Generation (RAG).


πŸ“Έ Sample Output

Light theme UI β€” welcome screen

alt text

Query in progress

alt text

Hallucination detected + RAG correction

alt text

Not hallucinated result

alt text

MySQL Workbench β€” chat_logs table

alt text alt text


🧠 How It Works

alt text


πŸ€– Model Information

Hallucination Detection Model

  • Model: Shreyash03Chimote/Hallucination_Detection
  • Type: CrossEncoder (NLI β€” Natural Language Inference)
  • Hosted on: HuggingFace πŸ€— (no download needed β€” loaded automatically via sentence-transformers)
  • Task: Given a (context, claim) pair β†’ predicts Entailment / Contradiction / Neutral

LLM (Chat + RAG)

  • Chat Model: smollm2:360m via Ollama
  • RAG Model: llama3.2:latest via Ollama
  • Local inference β€” no API key required for the LLM

Embeddings

  • Model: nomic-embed-text via Ollama
  • Stored in: Pinecone vector database

⚠️ No model weights need to be downloaded manually. All models load automatically on first run.


πŸ“Š Dataset

Live RAG Pipeline (Runtime)

This project uses a live RAG pipeline for query processing:

Component Source
Web context SerpAPI β€” real-time Google search results
Web content Scraped via langchain's WebBaseLoader
Vector index Pinecone β€” rebuilt per query (ephemeral namespace)
Chat logs MySQL (rag_app.chat_logs)

HalluRAG Dataset (Training & Testing)

For training and testing hallucination detection classifiers, this project uses the HalluRAG Dataset (pickle format):

Dataset Details:

  • Name: HalluRAG - Detecting Closed-Domain Hallucinations in RAG Applications
  • Size: 19,731 validly annotated sentences
  • Source: Wikipedia articles (recent updates after Feb 22, 2024 cutoff)
  • Models Used: LLaMA-2-7B, LLaMA-2-13B, Mistral-7B with quantizations (float8, int8, int4)
  • Contents:
    • βœ… RAG prompts (answerable & unanswerable questions)
    • βœ… LLM-generated responses
    • βœ… Internal states (contextualized embedding vectors, intermediate activation values)
    • βœ… Hallucination labels (binary classification)
  • Download: DOI: 10.17879/84958668505
  • Code: GitHub: F4biian/HalluRAG

Usage in this Project: The HalluRAG dataset (in pickle format) can be used to train MLP classifiers for sentence-level hallucination detection. The trained models analyze LLM internal states to predict whether a generated sentence is hallucinated, achieving test accuracies up to 75% (Mistral-7B).

Citation:

Ridder, F., & Schilling, M. (2024). 
The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications 
Using an LLM's Internal States. arXiv preprint arXiv:2412.17056v1

If you want to test with a fixed dataset, you can pre-populate the Pinecone index manually using the vector store utilities in backend/server.py.


πŸ—‚οΈ Project Structure

hallucination-rag/
β”œβ”€β”€ backend/                       # Python Flask AI backend
β”‚   β”œβ”€β”€ server.py                  # Main Flask app + RAG + hallucination detection
β”‚   β”œβ”€β”€ config.py                  # Model/threshold configuration
β”‚   └── requirements.txt           # Python dependencies
β”‚
β”œβ”€β”€ api/                           # Node.js MySQL REST API
β”‚   β”œβ”€β”€ server.js                  # Express server (port 3001)
β”‚   └── db.js                      # MySQL connection
β”‚
β”œβ”€β”€ frontend/                      # Static HTML/JS UI
β”‚   β”œβ”€β”€ index.html                 # Main chat app + integrated welcome hero
β”‚   β”œβ”€β”€ welcome.html               # Welcome page design (reference)
β”‚   └── public/                    # Favicons, web manifest
β”‚
β”œβ”€β”€ docs/                          # Documentation & architecture
β”‚   β”œβ”€β”€ README.md                  # Project documentation
β”‚   β”œβ”€β”€ plan.md                    # Technical planning notes
β”‚   β”œβ”€β”€ DockerPlan.md              # Docker setup (archived)
β”‚   β”œβ”€β”€ Flow_of_rag/               # Architecture diagrams & flow charts
β”‚   └── screenshots/               # UI screenshots & demos
β”‚
β”œβ”€β”€ scripts/                       # Helper shell scripts
β”‚   β”œβ”€β”€ start_backend.sh           # Start Flask server
β”‚   β”œβ”€β”€ cleanup.sh                 # Stop & cleanup processes
β”‚   └── init-ollama.sh             # Initialize Ollama models
β”‚
β”œβ”€β”€ model/                         # Embeddings & model storage
β”‚   └── (generated on first run)
β”‚
β”œβ”€β”€ START.sh                       # Main startup script (all services)
β”œβ”€β”€ QUICK_START.md                 # Setup & execution guide
β”œβ”€β”€ TEST_CASES_0_HALLUCINATION.md  # Test suite (0% hallucination cases)
β”œβ”€β”€ .env                           # Environment variables (user-created from .env.example)
β”œβ”€β”€ .env.example                   # Template for API keys & configuration
β”œβ”€β”€ .gitignore                     # Git ignore rules
β”œβ”€β”€ package.json                   # Node.js dependencies
β”œβ”€β”€ package-lock.json              # Locked dependency versions
β”œβ”€β”€ README.md                      # This file
└── image-*.png                    # Screenshots for documentation

βš™οΈ Prerequisites

Tool Version Purpose
Python 3.10+ Flask backend
Node.js 18+ MySQL API
Ollama Latest Local LLM + embeddings
MySQL 8.0+ Chat log persistence
SerpAPI key β€” Web search
Pinecone key β€” Vector database
HuggingFace token β€” CrossEncoder model

πŸš€ Setup & Execution

πŸ‘‰ For fastest setup, see QUICK_START.md or run:

chmod +x START.sh
./START.sh

This will automatically set up all services (Ollama, Flask backend, Node.js API, frontend) in tmux or provide instructions for manual terminal setup.


Manual Setup (if preferred):

1. Clone the repository

git clone https://github.com/<your-username>/hallucination-rag.git
cd hallucination-rag

2. Configure environment variables

cp .env.example .env
# Fill in your API keys (see .env.example for all required keys)

3. Install Ollama models

ollama pull smollm2:360m
ollama pull llama3.2:latest
ollama pull nomic-embed-text

4. Set up MySQL database

CREATE DATABASE rag_app;
USE rag_app;

CREATE TABLE chat_logs (
  id                  INT AUTO_INCREMENT PRIMARY KEY,
  query               TEXT NOT NULL,
  llm_response        TEXT,
  rag_response        TEXT,
  is_hallucinated     TINYINT(1)   DEFAULT 0,
  hallucination_score FLOAT,
  classification      VARCHAR(50),
  sentence_count      INT          DEFAULT 0,
  sources_count       INT          DEFAULT 0,
  sources             JSON,
  model_id            VARCHAR(150),
  response_time_ms    INT,
  created_at          TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
);

5. Start the Python Flask backend

python -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt

python backend/server.py
# β†’ Running at http://127.0.0.1:8080

6. Start the Node.js MySQL API (optional)

npm install
node api/server.js
# β†’ Running at http://127.0.0.1:3001

7. Open the UI

Open frontend/index.html in VS Code with Live Server, or visit:

http://127.0.0.1:5500

🌐 API Endpoints

Flask (:8080)

Method Path Description
GET /api/health Health check
GET /api/chat/stream?q=<query> SSE stream β€” LLM + RAG + hallucination
GET /api/models Currently configured model names

Node.js (:3001)

Method Path Description
POST /api/save Save a chat log to MySQL
GET /api/history Fetch last 100 chat logs
DELETE /api/history/:id Delete a specific log

πŸ”‘ Environment Variables

See .env.example for the full list. Required:

SERPAPI_API_KEY=your_serpapi_key
PINECONE_API_KEY=your_pinecone_key
HF_API_TOKEN=your_huggingface_token
OLLAMA_BASE_URL=http://localhost:11434

πŸ“¦ Tech Stack

Layer Technology
LLM (Chat) Ollama (llama3.2:latest)
LLM (RAG) Ollama (gemma:2b)
Embeddings Ollama (nomic-embed-text)
Vector DB Pinecone
Web Search SerpAPI
Hallucination Detection HuggingFace CrossEncoder (NLI)
Backend (AI) Python / Flask / LangChain
Backend (Logging) Node.js / Express
Database MySQL 8
Frontend Vanilla HTML / CSS / JS

Thank you !!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors