Skip to content

Infinity-002/Multimodal-Emotion-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Multimodal Emotion Inference System

A real-time AI system that uses scene understanding, facial emotion detection, and LLM-based reasoning to infer nuanced emotional states like lonely, curious, overwhelmed, or engaged. It combines image captioning, face-based emotion recognition, and language modeling to deliver human-like emotional insights.


Features

  • Live webcam feed processing
  • Scene captioning via BLIP
  • Facial emotion detection using ViT (FER2013-based)
  • Reasoning using StableLM to infer high-level emotional states
  • Customizable emotion vocabulary for contextual inference
  • Modular design with support for extension (e.g., saving logs, emotion overlays)

Installation

1. Clone the repository

git clone https://github.com/Infinity-002/Multimodal-Emotion-Detection.git
cd emotion-inference-system

2. Create a virtual environment (recommended)

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

You may also install using Conda:

conda create -n emotion-infer python=3.10
conda activate emotion-infer
pip install -r requirements.txt

Environment & Hardware Requirements

This project is designed to run on systems with GPU acceleration for smooth real-time performance.

Recommended:

  • GPU: NVIDIA GPU with at least 6GB VRAM (e.g., RTX 3060 or higher)
  • CUDA: CUDA 11.7+ and cuDNN installed
  • Python: 3.9 or 3.10
  • OS: Linux or Windows

Works on CPU?

  • Yes, but slower — inference speed (especially LLM + image models) will be significantly reduced
  • You can try CPU mode by setting:
    device = torch.device("cpu")

How It Works

  1. Webcam Capture: Captures frames in real-time.
  2. Scene Description: Generates a caption for each frame using Salesforce/blip-image-captioning-base.
  3. Face Detection: Detects faces using OpenCV's Haar cascade.
  4. Facial Emotion Detection: Classifies face emotion using mo-thecreator/vit-Facial-Expression-Recognition.
  5. LLM Fusion: Feeds scene + emotion + confidence into stabilityai/stablelm-2-zephyr-1_6b using a prompt template.
  6. Final Label: Displays nuanced emotional state (e.g., lonely, inspired, playful).

Output Example

  • Caption: "A student in a graduation gown raising her cap in the air."
  • Facial Emotion: happy (Confidence: 0.98)
  • Final Inferred State: overjoyed

File Structure

.
├── main.py                # Main inference script (real-time)
├── requirements.txt       # Python dependencies
├── README.md              # Project documentation                


📄 License

This project is licensed under the MIT License.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages