GroqLLM_DocChat is an AI-powered document chat system that extracts text from PDFs, generates vector embeddings, stores them in a vector database, and allows users to query information using natural language. It leverages LangChain, FAIS and google's embeddings to provide an efficient document retrieval and question-answering system.
- 📄 Extracts text from PDFs, PPTs, etc.
- 🔍 Stores embeddings in a vector database (FAISS)
- 🤖 Uses AI embeddings for semantic search
- 💬 Allows users to ask natural language questions about the document
- 🚀 Fast and scalable retrieval-augmented generation (RAG)
- ⚡ Uses Groq API for fast inferencing
- Python (Core language)
- LangChain (Document processing and retrieval)
- FAISS (Vector database)
- google/models/text-embedding-004 (Embeddings model)
- Docling (Document text extraction)
- Streamlit (API or UI for interaction)
Clone the repository:
git clone https://github.com/SPYLoveC2/GroqLLM_DocChat.git
cd GroqLLM_DocChatCreate a virtual environment and install dependencies:
conda env create -f environment.ymlStart the chatbot interface:
streamlit run app.py # If using Streamlit UIAsk questions about the document in the UI or API.
Set your API keys (if using OpenAI embeddings) in an .env file:
GOOGLE_API_KEY=your_api_key
GROQ=your_api_key- ✅ Basic Doc processing & embedding storage
- ✅ Q&A system with retrieval
- 🔜 Multi-document support
- 🔜 Fine-tuned LLM integration
Feel free to submit issues or pull requests to improve the project.