A Streamlit-based chatbot that helps users navigate and understand LangChain documentation by providing intelligent answers to questions using RAG (Retrieval-Augmented Generation) with Pinecone vector database.
- Intelligent Q&A: Ask questions about LangChain documentation and get accurate answers
- Source Citations: Each answer includes links to the relevant documentation sources
- Chat Interface: Interactive chat-like interface built with Streamlit
- Vector Search: Uses Pinecone vector database for efficient document retrieval
- OpenAI Integration: Powered by OpenAI's GPT models for natural language understanding
- Frontend: Streamlit
- Backend: Python, LangChain
- Vector Database: Pinecone
- LLM: OpenAI GPT
- Document Processing: BeautifulSoup4, LangChain document loaders
Before running this project, you'll need:
- Python 3.11+ installed on your system
- OpenAI API Key - Get one from OpenAI Platform
- Pinecone API Key - Get one from Pinecone Console
- Pinecone Environment - Your Pinecone environment (e.g.,
us-east-1-aws)
-
Clone the repository
git clone <your-repo-url> cd documentation-helper
-
Create a virtual environment
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Clone the repository
git clone <your-repo-url> cd documentation-helper
-
Install dependencies
pipenv install pipenv shell
-
Create environment file Create a
.envfile in the project root:OPENAI_API_KEY=your_openai_api_key_here PINECONE_API_KEY=your_pinecone_api_key_here PINECONE_ENVIRONMENT=your_pinecone_environment_here
-
Set up Pinecone Index
- Create a new index in your Pinecone console
- Use dimension:
1536(for text-embedding-3-small) - Use metric:
cosine
Before using the chatbot, you need to ingest the LangChain documentation:
-
Download documentation (if not already present)
# The ingestion script will download LangChain docs automatically -
Run the ingestion script
python ingestion.py
This will:
- Download LangChain documentation
- Process and chunk the documents
- Upload embeddings to Pinecone
- Create the vector index
-
Start the Streamlit app
streamlit run main.py
-
Open your browser Navigate to
http://localhost:8501 -
Ask questions
- Type your question about LangChain in the input field
- Press Enter or click the button
- Get answers with source citations
documentation-helper/
├── backend/
│ ├── __init__.py
│ └── core.py # Main LLM logic and RAG implementation
├── langchain-docs/ # Downloaded documentation (gitignored)
├── venv/ # Virtual environment (gitignored)
├── .env # Environment variables (gitignored)
├── .gitignore # Git ignore rules
├── LICENSE # Apache 2.0 license
├── Pipfile # Dependencies (pipenv)
├── Pipfile.lock # Locked dependencies
├── requirements.txt # Dependencies (pip)
├── README.md # This file
├── ingestion.py # Data ingestion script
└── main.py # Streamlit application
- Streamlit web interface
- Chat history management
- User interaction handling
- RAG implementation using LangChain
- Pinecone vector store integration
- OpenAI LLM configuration
- Document downloading and processing
- Text chunking and embedding
- Vector database population
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- LangChain for the amazing framework
- OpenAI for the language models
- Pinecone for the vector database
- Streamlit for the web framework
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include error messages and steps to reproduce
Happy coding! 🚀