A Python-based chatbot that allows users to upload PDF documents, process them, and interact with the content using OpenAI's GPT-4 and FAISS for efficient retrieval.
- Document Upload: Upload new PDF documents for processing.
- Text Chunking: Split documents into manageable text chunks.
- Caching: Store processed chunks and embeddings to avoid re-processing.
- Metadata Storage: Save document chunks along with timestamps and keywords to an SQLite database.
- Efficient Retrieval: Use FAISS for fast document retrieval.
- Interactive Chatbot: Ask questions about the uploaded documents and get answers.
Before you begin, ensure you have met the following requirements:
- Python 3.7 or higher
pip
(Python package installer)
-
Clone the repository:
git clone https://github.com/karlbernaldez/Eazeye-AI cd Eazeye-AI
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key" # On Windows use `set OPENAI_API_KEY=your-openai-api-key`
-
Run the script:
python main.py
-
Enter the path to the new PDF document when prompted.
-
Provide keywords for the document when prompted.
-
Interact with the chatbot by asking questions about the document.
- Running the Script: When you run
python main.py
, the script will initialize and check for existing processed data. - Uploading a Document: The script will prompt you to enter the path to a new PDF document.
- Processing the Document: The document is split into text chunks, which are then stored in an SQLite database along with metadata.
- Interacting with the Chatbot: You can ask the chatbot questions about the document, and it will retrieve relevant chunks and provide answers using GPT-4.
The architecture of this project includes the following components:
- Document Loader: Loads PDF documents for processing.
- Text Splitter: Splits documents into manageable text chunks.
- Embeddings Model: Uses HuggingFace models to generate embeddings for the text chunks.
- FAISS Index: Stores and retrieves embeddings efficiently.
- SQLite Database: Stores text chunks along with metadata (timestamp and keywords).
- Chatbot Interface: Uses OpenAI's GPT-4 to provide interactive Q&A based on document content.
- Document Processing: The script processes the uploaded PDF documents, splits them into text chunks, and stores them in an SQLite database along with metadata.
- Embedding and Indexing: The text chunks are embedded using a HuggingFace model and stored in a FAISS index for efficient retrieval.
- Chatbot Interface: The chatbot uses OpenAI's GPT-4 to answer questions based on the retrieved document chunks.
- Text Chunking: The documents are split into chunks of 1000 characters with a 200-character overlap to ensure context continuity.
- Caching: Processed text chunks and their embeddings are cached using
pickle
to avoid reprocessing in future runs. - Metadata Storage: Each chunk is stored with a timestamp and user-provided keywords in an SQLite database to facilitate organized retrieval and querying.
- Retrieval: The FAISS index allows for fast and efficient retrieval of relevant text chunks based on user queries.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.