A real-time AI system that uses scene understanding, facial emotion detection, and LLM-based reasoning to infer nuanced emotional states like lonely, curious, overwhelmed, or engaged. It combines image captioning, face-based emotion recognition, and language modeling to deliver human-like emotional insights.
- Live webcam feed processing
- Scene captioning via BLIP
- Facial emotion detection using ViT (FER2013-based)
- Reasoning using StableLM to infer high-level emotional states
- Customizable emotion vocabulary for contextual inference
- Modular design with support for extension (e.g., saving logs, emotion overlays)
git clone https://github.com/Infinity-002/Multimodal-Emotion-Detection.git
cd emotion-inference-systempython -m venv env
source env/bin/activate # On Windows: env\Scripts\activatepip install -r requirements.txtYou may also install using Conda:
conda create -n emotion-infer python=3.10
conda activate emotion-infer
pip install -r requirements.txtThis project is designed to run on systems with GPU acceleration for smooth real-time performance.
- GPU: NVIDIA GPU with at least 6GB VRAM (e.g., RTX 3060 or higher)
- CUDA: CUDA 11.7+ and cuDNN installed
- Python: 3.9 or 3.10
- OS: Linux or Windows
- Yes, but slower — inference speed (especially LLM + image models) will be significantly reduced
- You can try CPU mode by setting:
device = torch.device("cpu")
- Webcam Capture: Captures frames in real-time.
- Scene Description: Generates a caption for each frame using
Salesforce/blip-image-captioning-base. - Face Detection: Detects faces using OpenCV's Haar cascade.
- Facial Emotion Detection: Classifies face emotion using
mo-thecreator/vit-Facial-Expression-Recognition. - LLM Fusion: Feeds scene + emotion + confidence into
stabilityai/stablelm-2-zephyr-1_6busing a prompt template. - Final Label: Displays nuanced emotional state (e.g.,
lonely,inspired,playful).
- Caption: "A student in a graduation gown raising her cap in the air."
- Facial Emotion:
happy(Confidence: 0.98) - Final Inferred State:
overjoyed
.
├── main.py # Main inference script (real-time)
├── requirements.txt # Python dependencies
├── README.md # Project documentation
This project is licensed under the MIT License.