Skip to content

deathlegionteamlk/DeepFaceReal-Physics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


🎬 What Is This?

HeyGen costs $24–240/month and runs in the cloud. SadTalker needs a GPU and still only hits 10 FPS. DeepFaceReal-Physics runs on your CPU, is free, and ships everything: 3D face reconstruction, audio-driven head motion, Wav2Lip lip sync, natural eye gaze, conversational gestures, body physics β€” the full stack.

No subscription. No GPU required. No waiting on cloud queues.


✨ Features

Features Animation

Engine What It Does
🎯 3D Face 468 MediaPipe landmarks, Delaunay triangulation, 6DoF head pose (pitch/yaw/roll/xyz), expression blendshapes
πŸ—£οΈ Talking Head MFCC/pitch/energy extraction, audio-to-head-pose mapping, natural nodding/tilting patterns
πŸ‘„ Wav2Lip Phoneme detection, lip shape prediction, temporal smoothing, real-time audio buffering
πŸ‘οΈ Eye & Gaze Saccades every 200–300ms, natural blinks every 2–4s, gaze target tracking, pupil rendering
πŸ‘ Gestures Speech-rhythm hand movement, shoulder/head micro-shifts, posture variation, 0.0–1.0 intensity knob
πŸ”„ Pipeline Async multi-stage queue, frame skipping, cached inference β€” all CPU-optimized
🧠 Body Physics MediaPipe Holistic (543 landmarks), momentum/inertia, spring dynamics
πŸ–ΌοΈ Parallax BG 3-layer depth parallax driven by head position, depth blur
πŸ“± Mobile Camera IP Webcam integration for Android phone as webcam
πŸ’¬ Characters OpenRouter LLM with personality system prompts
πŸ–₯️ UI Streamlit v2 dashboard, HeyGen Mode preset, recording export
πŸ”Œ API FastAPI v2 with REST + WebSocket endpoints
πŸͺŸ Windows EXE PyInstaller standalone build

πŸ“Š HeyGen vs SadTalker vs DeepFaceReal

Spoiler
Capability HeyGen ($24–240/mo) SadTalker DeepFaceReal-Physics
3D Face Reconstruction βœ… ❌ βœ…
Audio-Driven Head Motion βœ… βœ… βœ…
Wav2Lip Lip Sync βœ… ❌ βœ…
Natural Eye Gaze βœ… ❌ βœ…
Conversational Gestures βœ… ❌ βœ…
Real-Time β‰₯15 FPS ❌ Cloud ⚠️ 10 FPS GPU βœ… 15–20 FPS CPU
Face Swap ❌ ❌ βœ…
LLM Character AI ⚠️ Limited ❌ βœ…
Self-Hosted ❌ βœ… βœ…
Open Source ❌ βœ… βœ…
Price $24–240/month Free Free
WhatsApp Integration ❌ ❌ βœ…
Windows EXE N/A Manual βœ…
API + WebSocket βœ… ❌ βœ…
GPU Required βœ… βœ… ❌ CPU only

πŸš€ Quick Start

Quick start hint

Prerequisites: Python 3.10+, 4GB RAM (8GB recommended), webcam

Install & Run

# Clone
git clone https://github.com/deathlegionteamlk/DeepFaceReal-Physics.git
cd DeepFaceReal-Physics

# Install
pip install -r requirements.txt

# Start the UI (port 8080)
streamlit run app.py --server.port 8080

# In a second terminal β€” start the API (port 8081)
python api.py

Docker

docker build -t deepfacereal-physics .
docker run -p 8080:8080 -p 8081:8081 deepfacereal-physics

πŸ—οΈ Architecture

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚        Input Sources          β”‚
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”‚
                    β”‚  β”‚ USB  β”‚ β”‚   IP   β”‚ β”‚Audioβ”‚ β”‚
                    β”‚  β”‚ Cam  β”‚ β”‚ Webcam β”‚ β”‚File β”‚ β”‚
                    β”‚  β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜ β”‚
                    β””β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”˜
                          β”‚         β”‚          β”‚
                    β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”
                    β”‚    Audio Feature Extraction   β”‚
                    β”‚   (MFCC Β· Pitch Β· Energy Β· F0)β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚         3D Face Engine         β”‚
                    β”‚  MediaPipe 468 landmarks       β”‚
                    β”‚  Delaunay Triangulation        β”‚
                    β”‚  6DoF Head Pose (solvePnP)     β”‚
                    β”‚  Expression Blendshapes        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Talking Head Engine      β”‚
                    β”‚  Audio β†’ head pose             β”‚
                    β”‚  Audio β†’ expression            β”‚
                    β”‚  Natural motion patterns       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Lip Sync (Wav2Lip)       β”‚
                    β”‚  Phoneme detection             β”‚
                    β”‚  Lip shape prediction          β”‚
                    β”‚  Wav2Lip inference             β”‚
                    β”‚  Temporal smoothing            β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                        β”‚                         β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
 β”‚  Eye & Gaze     β”‚    β”‚   Gesture Engine     β”‚   β”‚ Physics Engine  β”‚
 β”‚  Saccades       β”‚    β”‚   Hand gestures      β”‚   β”‚ Momentum        β”‚
 β”‚  Blinks         β”‚    β”‚   Shoulder/head      β”‚   β”‚ Spring dynamics β”‚
 β”‚  Gaze tracking  β”‚    β”‚   Posture shifts     β”‚   β”‚ Frame skipping  β”‚
 β”‚  Pupil render   β”‚    β”‚   Intensity config   β”‚   β”‚ Async queues    β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                        β”‚                         β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚      Composite & Render        β”‚
                    β”‚  Face swap Β· overlays          β”‚
                    β”‚  Background Β· enhancement      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚            Output              β”‚
                    β”‚  Streamlit :8080               β”‚
                    β”‚  FastAPI   :8081               β”‚
                    β”‚  Virtual Camera                β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Engine Docs

1. 3D Face Engine β€” core/face_3d_engine.py

468 MediaPipe landmarks β†’ Delaunay triangulation β†’ 3D mesh. 6DoF head pose via solvePnP. Expression blendshape extraction.

face_3d = get_face_3d_engine()
mesh = face_3d.process_frame(image)

2. Talking Head β€” core/talking_head.py

13 MFCC coefficients, F0 pitch, RMS energy β†’ head pose prediction. Drives speech-synchronized nodding and tilt.

talking_head = get_talking_head()
seq = talking_head.process_audio(audio_data, face_img)

3. Wav2Lip Lip Sync β€” core/lip_sync.py

Phoneme detection β†’ lip shape parameters β†’ Wav2Lip inference on face region. EMA filter keeps transitions smooth.

lip_sync = create_lip_sync()
frame = lip_sync.sync_frame(face_img, audio_chunk)

4. Eye & Gaze Engine β€” core/eye_engine.py

Saccades every 200–300ms, micro-saccades during fixation, blinks every 2–4s (100–400ms duration). Configurable gaze targets.

eye_engine = get_eye_engine()
state = eye_engine.update()

5. Gesture Engine β€” core/gesture_engine.py

Hand patterns keyed to audio energy, shoulder/head micro-movements, posture variation. Intensity from 0.0 to 1.0.

gesture = get_gesture_engine()
params = gesture.process_gestures(energy)

6. Real-Time Pipeline β€” core/pipeline.py

Async queue per stage. Frame skipping for CPU relief. Cached Wav2Lip results for repeated phonemes. Resolution management (downscale detect, upscale output).

pipeline = get_realtime_pipeline()
pipeline.start()

πŸ’» Professional UI

The Streamlit UI (app.py) runs on port 8080.

Tab What's Inside
🎯 Avatar Studio Source photo upload, real-time preview, recording export
πŸ“± Mobile QR code for IP Webcam, auto-detect, camera source picker
πŸ‘€ Characters Gallery, creation, management with face data
πŸ’¬ Chat LLM character conversation with message history
🎬 Engines Live status for all 6 engines, per-stage timing
βš™οΈ Settings Engine toggles, sliders, quality controls, HeyGen Mode preset

HeyGen Mode

One click. Turns everything on at max quality:

βœ… 3D Face Β· βœ… Talking Head Β· βœ… Wav2Lip Β· βœ… Eye Gaze Β· βœ… Gestures Β· βœ… Parallax BG Β· βœ… Physics Β· βœ… High Quality Enhancement


πŸ”Œ API v2

FastAPI (api.py) on port 8081. Auto-generated docs at /docs.

v2 Endpoints

Method Path What It Does
POST /generate/talking-head Generate talking head video from audio + face image
POST /animate/face Animate face with expression coefficients + head pose
WS /ws/realtime Real-time streaming with head pose + eye state
POST /config/render Configure any engine's render parameters

Legacy Endpoints

Method Path
GET / API info
GET /status System status with per-engine FPS
POST /swap Face swap on uploaded image
POST /chat Send message to character LLM
GET/POST/DELETE /characters Character CRUD
POST /characters/{name}/activate Activate character
POST/GET /physics/config, /physics/status Physics control
POST/GET /camera/source, /camera/status Camera control
WS /ws/chat, /ws/video Streaming chat + video

Talking Head β€” Example

curl -X POST http://localhost:8081/generate/talking-head \
  -H "Content-Type: application/json" \
  -d '{
    "audio_b64": "BASE64_ENCODED_WAV_AUDIO",
    "face_b64": "BASE64_ENCODED_FACE_IMAGE",
    "fps": 20
  }'

Render Config β€” Example

curl -X POST http://localhost:8081/config/render \
  -H "Content-Type: application/json" \
  -d '{"engine": "eye", "config": {"blink_interval_min": 1.5, "blink_interval_max": 4.0}}'

πŸͺŸ Windows EXE

# On Windows
pip install pyinstaller
python build_exe.py

# Output: dist/DeepFaceReal.exe + DeepFaceReal_API.exe

The build bundles all core modules, InsightFace models (buffalo_l, inswapper_128), MediaPipe models, Wav2Lip models, OpenCV/NumPy/Pillow/Streamlit/FastAPI, and a launcher batch file.


πŸ“± Mobile Integration

Android (IP Webcam)

  1. Install IP Webcam from Google Play
  2. Tap Start Server
  3. Note the IP (e.g. 192.168.1.100:8080)
  4. In Streamlit β†’ πŸ“± Mobile tab β†’ enter IP or scan QR code

Desktop Virtual Camera

sudo apt install v4l2loopback-dkms
sudo modprobe v4l2loopback devices=1 video_nr=10

Start the pipeline β†’ virtual camera appears as a device β†’ select "DeepFakeCam" in WhatsApp Desktop, Zoom, or Meet.


πŸ› οΈ Project Structure

DeepFaceReal-Physics/
β”œβ”€β”€ app.py                    # Streamlit UI v2 (port 8080)
β”œβ”€β”€ api.py                    # FastAPI v2 (port 8081)
β”œβ”€β”€ build_exe.py              # Windows EXE builder
β”œβ”€β”€ start.sh                  # Launch both services
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ face_3d_engine.py     # 3D face reconstruction + pose
β”‚   β”œβ”€β”€ talking_head.py       # Audio-driven talking head
β”‚   β”œβ”€β”€ lip_sync.py           # Wav2Lip lip sync
β”‚   β”œβ”€β”€ eye_engine.py         # Eye & gaze
β”‚   β”œβ”€β”€ gesture_engine.py     # Conversational gestures
β”‚   β”œβ”€β”€ pipeline.py           # Real-time async pipeline
β”‚   β”œβ”€β”€ face_swapper.py       # InsightFace swap
β”‚   β”œβ”€β”€ physics_engine.py     # MediaPipe Holistic + physics
β”‚   β”œβ”€β”€ background_engine.py  # Parallax background
β”‚   β”œβ”€β”€ webcam_pipeline.py    # Camera capture
β”‚   β”œβ”€β”€ character_manager.py  # Character profiles
β”‚   └── llm_character.py      # OpenRouter LLM
β”œβ”€β”€ models/                   # Downloaded models
β”œβ”€β”€ profiles/                 # Character profiles
└── static/                   # Static assets

πŸ“Š Performance

Engine Target FPS CPU Cores Resolution
3D Face 30 FPS 1 640Γ—480
Talking Head 30 FPS 1 Audio only
Lip Sync 20 FPS 1 Face region
Eye Gaze 60 FPS 0.5 Overlay
Gestures 30 FPS 0.5 Overlay
Pipeline Total β‰₯15–20 FPS 4 cores 640Γ—480

Every heavy stage processes every 2nd–3rd frame. Repeated phonemes hit the Wav2Lip cache. Async queues keep everything non-blocking.


🀝 Contributing

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit (git commit -m 'Add your feature')
  4. Push (git push origin feature/your-feature)
  5. Open a Pull Request

πŸ“„ License

MIT License β€” see LICENSE for details.


πŸ™ Credits

Footer

Built on:

InsightFace β€’ MediaPipe β€’ Wav2Lip β€’ ONNX Runtime β€’ OpenRouter β€’ Streamlit β€’ FastAPI β€’ SciPy β€’ scikit-image

Inspired by SadTalker, Ditto, and HeyGen.

⭐ Star on GitHub


DeepFaceReal-Physics v2.0.0 Β· MIT License Β· DeathLegionTeamLK

About

Real-time deepfake detection with full body tracking, physics simulation, parallax background, WhatsApp integration, and LLM character responses. Built by Death Legion Team.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors