This repository implements a Retrieval-Augmented Generation (RAG) chatbot powered by a Qdrant vector database and an NVIDIA-hosted LLM (Meta LLaMA-3.1-405B). The pipeline covers document ingestion, chunking, embedding, vector store creation, and a Streamlit front‑end for interactive question answering.
Raw documents (PDF, TXT)
└─> preprocess.py (load + split into chunks)
└─> 2,000‑char chunks w/200‑char overlap
└─> HuggingFaceEmbeddings (all‑MiniLM‑L6‑v2)
└─> Qdrant vector store (collection: amlgo‑docs)
└─> rag_pipeline.py (RetrievalQA chain)
└─> ChatNVIDIA LLM (llama-3.1-405b-instruct)
└─> app.py (Streamlit UI)
git clone <repo-url>
cd Code
pip install -r requirements- Place your PDF/TXT files in
../data/ - Run chunking and ingestion:
python preprocess.py
- Splits documents into 2,000‑char chunks
- Generates embeddings via
sentence‑transformers/all‑MiniLM‑L6‑v2 - Creates or overwrites Qdrant collection
amlgo‑docs
No separate step required — the first call to the Streamlit app will initialize the Retriever + LLM chain.
streamlit run app.py- Opens UI at
http://localhost:8501 - Enter your NVIDIA API key as env var or modify
rag_pipeline.py - Queries stream in real‑time; history panel shows previous Q&A
- Embedding Model:
sentence‑transformers/all‑MiniLM‑L6‑v2(384‑dim cosine) - LLM:
llama-3.1-405b-instructviaChatNVIDIA(streaming enabled) - Vector DB: Qdrant (self‑hosted or cloud, COSINE distance)
Query 1
Query 2
Query 3
Query 4
Conversation History
Clear Buttom
Streaming response Video





