Discover how to build an intelligent conversation system that goes beyond text-based interactions. This project demonstrates how to enhance LLMs/vLLMs/STT models with Retrieval-Augmented Generation (RAG) techniques to create a multimodal chatbot capable of understanding and discussing your images and videos. Experience natural conversations about your content through an intuitive interface that bridges the gap between advanced AI technology and everyday visual and audio media.
-
Do the following before installing the dependencies found in
requirements.txt
file because of current challenges installingonnxruntime
throughpip install onnxruntime
.- For MacOS users, a workaround is to first install
onnxruntime
dependency forchromadb
using:
conda install onnxruntime -c conda-forge
See this thread for additonal help if needed.
- For Windows users, follow the guide here to install the Microsoft C++ Build Tools. Be sure to follow through to the last step to set the enviroment variable path.
- For MacOS users, a workaround is to first install
-
Now run this command to install dependenies in the
requirements.txt
file.
pip install -r requirements.txt
- Install markdown depenendies with:
pip install "unstructured[all-docs]"
- Install Tesseract for unstructured, follow guide here for more information:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
- We are going to use Llama 3 available on Hugging Face. Therefore, requesting the permission to use it and loging in hugging face before running is required. Replace
$HUGGINGFACE_TOKEN
with your token.
pip install -U "huggingface_hub[cli]"
huggingface-cli login --token $HUGGINGFACE_TOKEN
Several example data located at data
. You can add your custom data.
Create the Chroma DB.
export PYTHONPATH=$(pwd)
python backend/create_database.py
python app.py
Please note that the response time may vary depending on the resources available on your computer (12 GB VRAM at least).