Skip to content

A chatbot can process and analyze various forms of media including text, images, videos, and other data types.

Notifications You must be signed in to change notification settings

khoi03/Multimodal-ChatBot

Repository files navigation

ChatBot: LLM-Powered Conversations That See and Hear Your Content

Introduction

Discover how to build an intelligent conversation system that goes beyond text-based interactions. This project demonstrates how to enhance LLMs/vLLMs/STT models with Retrieval-Augmented Generation (RAG) techniques to create a multimodal chatbot capable of understanding and discussing your images and videos. Experience natural conversations about your content through an intuitive interface that bridges the gap between advanced AI technology and everyday visual and audio media.

Install dependencies

  1. Do the following before installing the dependencies found in requirements.txt file because of current challenges installing onnxruntime through pip install onnxruntime.

    • For MacOS users, a workaround is to first install onnxruntime dependency for chromadb using:
     conda install onnxruntime -c conda-forge

    See this thread for additonal help if needed.

    • For Windows users, follow the guide here to install the Microsoft C++ Build Tools. Be sure to follow through to the last step to set the enviroment variable path.
  2. Now run this command to install dependenies in the requirements.txt file.

pip install -r requirements.txt
  1. Install markdown depenendies with:
pip install "unstructured[all-docs]"
  1. Install Tesseract for unstructured, follow guide here for more information:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
  1. We are going to use Llama 3 available on Hugging Face. Therefore, requesting the permission to use it and loging in hugging face before running is required. Replace $HUGGINGFACE_TOKEN with your token.
pip install -U "huggingface_hub[cli]"
huggingface-cli login --token $HUGGINGFACE_TOKEN

Create database

Several example data located at data. You can add your custom data.

Create the Chroma DB.

export PYTHONPATH=$(pwd)
python backend/create_database.py

Run chatbot app

python app.py

Please note that the response time may vary depending on the resources available on your computer (12 GB VRAM at least).

About

A chatbot can process and analyze various forms of media including text, images, videos, and other data types.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published