LangChain AI Doc Assistant is an intelligent document processing application built using Streamlit and LangChain. It allows users to upload research documents (PDFs), process them, and ask questions based on their content. The system uses the NVIDIA AI endpoint (Llama 3.3-70B) for natural language processing and provides concise, factual responses.
- Upload and analyze PDF documents.
- Extract relevant document content using LangChain's PDFPlumberLoader.
- Chunk text for efficient processing.
- Use In-Memory Vector Store for document embeddings and similarity search.
- Generate intelligent responses using NVIDIA's Llama-3.3-70B-instruct model.
- Interactive chat UI built with Streamlit.
Ensure you have an NVIDIA API key to access the Llama-3.3-70B-instruct model. If you don't have one, create one from here https://build.nvidia.com/explore/discover
- Clone this repository:
git clone https://github.com/yourusername/langchain-ai-doc-assistant.git cd langchain-ai-doc-assistant - Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.envfile in the root directory. - Add your NVIDIA API key:
NVIDIA_API_KEY=your_nvidia_api_key
- Create a
- Run the application:
streamlit run app.py
- Open the app in your browser (default:
http://localhost:8501). - Upload a PDF document.
- Wait for the document to be processed.
- Ask questions related to the document's content.
- View AI-generated responses based on the document context.
- LangChain: Framework for building LLM-powered applications.
- Streamlit: Web-based UI for document interaction.
- NVIDIA AI Endpoints: Llama-3.3-70B-instruct model for question-answering.
- PDFPlumber: Extracting text from PDFs.
- RecursiveCharacterTextSplitter: Chunking document text for processing.
- InMemoryVectorStore: Storing and retrieving document embeddings.
- Implement database support for storing document embeddings.
- Enable multi-document uploads and cross-document querying.
- Support additional LLM models and APIs.
Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.