Real-Time Multimodal Emotion Inference System

A real-time AI system that uses scene understanding, facial emotion detection, and LLM-based reasoning to infer nuanced emotional states like lonely, curious, overwhelmed, or engaged. It combines image captioning, face-based emotion recognition, and language modeling to deliver human-like emotional insights.

Features

Live webcam feed processing
Scene captioning via BLIP
Facial emotion detection using ViT (FER2013-based)
Reasoning using StableLM to infer high-level emotional states
Customizable emotion vocabulary for contextual inference
Modular design with support for extension (e.g., saving logs, emotion overlays)

Installation

1. Clone the repository

git clone https://github.com/Infinity-002/Multimodal-Emotion-Detection.git
cd emotion-inference-system

2. Create a virtual environment (recommended)

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

You may also install using Conda:

conda create -n emotion-infer python=3.10
conda activate emotion-infer
pip install -r requirements.txt

Environment & Hardware Requirements

This project is designed to run on systems with GPU acceleration for smooth real-time performance.

Works on CPU?

Yes, but slower — inference speed (especially LLM + image models) will be significantly reduced
You can try CPU mode by setting:
```
device = torch.device("cpu")
```

How It Works

Webcam Capture: Captures frames in real-time.
Scene Description: Generates a caption for each frame using Salesforce/blip-image-captioning-base.
Face Detection: Detects faces using OpenCV's Haar cascade.
Facial Emotion Detection: Classifies face emotion using mo-thecreator/vit-Facial-Expression-Recognition.
LLM Fusion: Feeds scene + emotion + confidence into stabilityai/stablelm-2-zephyr-1_6b using a prompt template.
Final Label: Displays nuanced emotional state (e.g., lonely, inspired, playful).

Output Example

Caption: "A student in a graduation gown raising her cap in the air."
Facial Emotion: happy (Confidence: 0.98)
Final Inferred State: overjoyed

File Structure

.
├── main.py                # Main inference script (real-time)
├── requirements.txt       # Python dependencies
├── README.md              # Project documentation

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE.txt		LICENSE.txt
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Multimodal Emotion Inference System

Features

Installation

1. Clone the repository

2. Create a virtual environment (recommended)

3. Install dependencies

Environment & Hardware Requirements

Recommended:

Works on CPU?

How It Works

Output Example

File Structure

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time Multimodal Emotion Inference System

Features

Installation

1. Clone the repository

2. Create a virtual environment (recommended)

3. Install dependencies

Environment & Hardware Requirements

Recommended:

Works on CPU?

How It Works

Output Example

File Structure

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages