Note: Visit the Moondream repositories GitHub and navigate to /recipes for the latest version of this demo.
⚠️ IMPORTANT: This project currently uses Moondream 2 (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.
- Overview
- Sample Output
- Features
- Prerequisites
- Installation
- Usage
- Output
- Troubleshooting
- Performance Notes
- Dependencies
- Model Details
- License
This project uses the Moondream 2 model to detect faces and their gaze directions in videos. It processes videos frame by frame, visualizing face detections and gaze directions with dynamic visual effects.
Input Video | Processed Output |
---|---|
- Face detection in video frames
- Gaze direction tracking
- Real-time visualization with:
- Colored bounding boxes for faces
- Gradient lines showing gaze direction
- Gaze target points
- Supports multiple faces per frame
- Processes all common video formats (.mp4, .avi, .mov, .mkv)
- Uses Moondream 2 (2025-01-09 release) via Hugging Face Transformers
- Note: Will be migrated to official client libraries in future updates
- No authentication required
- Python 3.8 or later
- CUDA-capable GPU recommended (but CPU mode works too)
- FFmpeg installed on your system
-
Install system dependencies:
# Ubuntu/Debian sudo apt-get update && sudo apt-get install -y libvips42 libvips-dev ffmpeg # CentOS/RHEL sudo yum install vips vips-devel ffmpeg # macOS brew install vips ffmpeg
-
Clone and setup the project:
git clone [repository-url] cd gaze-detection-video python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
Windows setup requires a few additional steps for proper GPU support and libvips installation.
-
Clone the repository:
git clone https://github.com/parsakhaz/gaze-detection-video.git cd gaze-detection-video
-
Create and activate virtual environment:
python -m venv venv .\venv\Scripts\activate
-
Install PyTorch with CUDA support:
# For NVIDIA GPUs pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
Install libvips: Download the appropriate version based on your system architecture:
Architecture VIPS Version to Download 32-bit x86 vips-dev-w32-all-8.16.0.zip 64-bit x64 vips-dev-w64-all-8.16.0.zip - Extract the ZIP file
- Copy all DLL files from
vips-dev-8.16\bin
to either:- Your project's root directory (easier) OR
C:\Windows\System32
(requires admin privileges)
- Add to PATH:
- Open System Properties → Advanced → Environment Variables
- Under System Variables, find PATH
- Add the full path to the
vips-dev-8.16\bin
directory
-
Install FFmpeg:
- Download from https://ffmpeg.org/download.html#build-windows
- Extract and add the
bin
folder to your system PATH (similar to step 4) or to the project root directory
-
Install other dependencies:
pip install -r requirements.txt
-
Place your input videos in the
input
directory- Supported formats: .mp4, .avi, .mov, .mkv
- The directory will be created automatically if it doesn't exist
-
Run the script:
python gaze-detection-video.py
-
The script will:
- Process all videos in the input directory
- Show progress bars for each video
- Save processed videos to the
output
directory with prefix 'processed_'
- Processed videos are saved as
output/processed_[original_name].[ext]
- Each frame in the output video shows:
- Colored boxes around detected faces
- Lines indicating gaze direction
- Points showing where each person is looking
-
CUDA/GPU Issues:
- Ensure you have CUDA installed for GPU support
- The script will automatically fall back to CPU if no GPU is available
-
Memory Issues:
- If processing large videos, ensure you have enough RAM
- Consider reducing video resolution if needed
-
libvips Errors:
- Make sure libvips is properly installed for your OS
- Check system PATH includes libvips
-
Video Format Issues:
- Ensure FFmpeg is installed and in your system PATH
- Try converting problematic videos to MP4 format
- GPU processing is significantly faster than CPU
- Processing time depends on:
- Video resolution
- Number of faces per frame
- Frame rate
- Available computing power
- transformers (for Moondream 2 model access)
- torch
- opencv-python
- pillow
- matplotlib
- numpy
- tqdm
- pyvips
- accelerate
- einops
⚠️ IMPORTANT: This project currently uses Moondream 2 (2025-01-09 release) via the Hugging Face Transformers library. We will migrate to the official Moondream client libraries once they become available for this version.