Features β’ Comparison β’ Quick Start β’ Architecture β’ Engine Docs β’ API v2 β’ Windows EXE β’ Credits
HeyGen costs $24β240/month and runs in the cloud. SadTalker needs a GPU and still only hits 10 FPS. DeepFaceReal-Physics runs on your CPU, is free, and ships everything: 3D face reconstruction, audio-driven head motion, Wav2Lip lip sync, natural eye gaze, conversational gestures, body physics β the full stack.
No subscription. No GPU required. No waiting on cloud queues.
| Engine | What It Does |
|---|---|
| π― 3D Face | 468 MediaPipe landmarks, Delaunay triangulation, 6DoF head pose (pitch/yaw/roll/xyz), expression blendshapes |
| π£οΈ Talking Head | MFCC/pitch/energy extraction, audio-to-head-pose mapping, natural nodding/tilting patterns |
| π Wav2Lip | Phoneme detection, lip shape prediction, temporal smoothing, real-time audio buffering |
| ποΈ Eye & Gaze | Saccades every 200β300ms, natural blinks every 2β4s, gaze target tracking, pupil rendering |
| π Gestures | Speech-rhythm hand movement, shoulder/head micro-shifts, posture variation, 0.0β1.0 intensity knob |
| π Pipeline | Async multi-stage queue, frame skipping, cached inference β all CPU-optimized |
| π§ Body Physics | MediaPipe Holistic (543 landmarks), momentum/inertia, spring dynamics |
| πΌοΈ Parallax BG | 3-layer depth parallax driven by head position, depth blur |
| π± Mobile Camera | IP Webcam integration for Android phone as webcam |
| π¬ Characters | OpenRouter LLM with personality system prompts |
| π₯οΈ UI | Streamlit v2 dashboard, HeyGen Mode preset, recording export |
| π API | FastAPI v2 with REST + WebSocket endpoints |
| πͺ Windows EXE | PyInstaller standalone build |
| Capability | HeyGen ($24β240/mo) | SadTalker | DeepFaceReal-Physics |
|---|---|---|---|
| 3D Face Reconstruction | β | β | β |
| Audio-Driven Head Motion | β | β | β |
| Wav2Lip Lip Sync | β | β | β |
| Natural Eye Gaze | β | β | β |
| Conversational Gestures | β | β | β |
| Real-Time β₯15 FPS | β Cloud | β 15β20 FPS CPU | |
| Face Swap | β | β | β |
| LLM Character AI | β | β | |
| Self-Hosted | β | β | β |
| Open Source | β | β | β |
| Price | $24β240/month | Free | Free |
| WhatsApp Integration | β | β | β |
| Windows EXE | N/A | Manual | β |
| API + WebSocket | β | β | β |
| GPU Required | β | β | β CPU only |
Prerequisites: Python 3.10+, 4GB RAM (8GB recommended), webcam
# Clone
git clone https://github.com/deathlegionteamlk/DeepFaceReal-Physics.git
cd DeepFaceReal-Physics
# Install
pip install -r requirements.txt
# Start the UI (port 8080)
streamlit run app.py --server.port 8080
# In a second terminal β start the API (port 8081)
python api.pydocker build -t deepfacereal-physics .
docker run -p 8080:8080 -p 8081:8081 deepfacereal-physics ββββββββββββββββββββββββββββββββ
β Input Sources β
β ββββββββ ββββββββββ βββββββ β
β β USB β β IP β βAudioβ β
β β Cam β β Webcam β βFile β β
β ββββ¬ββββ βββββ¬βββββ ββββ¬βββ β
βββββββΌββββββββββΌβββββββββββΌββββ
β β β
βββββββΌββββββββββΌβββββββββββΌββββ
β Audio Feature Extraction β
β (MFCC Β· Pitch Β· Energy Β· F0)β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββ
β 3D Face Engine β
β MediaPipe 468 landmarks β
β Delaunay Triangulation β
β 6DoF Head Pose (solvePnP) β
β Expression Blendshapes β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββ
β Talking Head Engine β
β Audio β head pose β
β Audio β expression β
β Natural motion patterns β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββ
β Lip Sync (Wav2Lip) β
β Phoneme detection β
β Lip shape prediction β
β Wav2Lip inference β
β Temporal smoothing β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
ββββββββββΌβββββββββ ββββββββββββΌβββββββββββ βββββββββββΌββββββββ
β Eye & Gaze β β Gesture Engine β β Physics Engine β
β Saccades β β Hand gestures β β Momentum β
β Blinks β β Shoulder/head β β Spring dynamics β
β Gaze tracking β β Posture shifts β β Frame skipping β
β Pupil render β β Intensity config β β Async queues β
ββββββββββ¬βββββββββ ββββββββββββ¬βββββββββββ βββββββββββ¬ββββββββ
β β β
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββ
β Composite & Render β
β Face swap Β· overlays β
β Background Β· enhancement β
ββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββ
β Output β
β Streamlit :8080 β
β FastAPI :8081 β
β Virtual Camera β
βββββββββββββββββββββββββββββββββ
468 MediaPipe landmarks β Delaunay triangulation β 3D mesh. 6DoF head pose via solvePnP. Expression blendshape extraction.
face_3d = get_face_3d_engine()
mesh = face_3d.process_frame(image)13 MFCC coefficients, F0 pitch, RMS energy β head pose prediction. Drives speech-synchronized nodding and tilt.
talking_head = get_talking_head()
seq = talking_head.process_audio(audio_data, face_img)Phoneme detection β lip shape parameters β Wav2Lip inference on face region. EMA filter keeps transitions smooth.
lip_sync = create_lip_sync()
frame = lip_sync.sync_frame(face_img, audio_chunk)Saccades every 200β300ms, micro-saccades during fixation, blinks every 2β4s (100β400ms duration). Configurable gaze targets.
eye_engine = get_eye_engine()
state = eye_engine.update()Hand patterns keyed to audio energy, shoulder/head micro-movements, posture variation. Intensity from 0.0 to 1.0.
gesture = get_gesture_engine()
params = gesture.process_gestures(energy)Async queue per stage. Frame skipping for CPU relief. Cached Wav2Lip results for repeated phonemes. Resolution management (downscale detect, upscale output).
pipeline = get_realtime_pipeline()
pipeline.start()The Streamlit UI (app.py) runs on port 8080.
| Tab | What's Inside |
|---|---|
| π― Avatar Studio | Source photo upload, real-time preview, recording export |
| π± Mobile | QR code for IP Webcam, auto-detect, camera source picker |
| π€ Characters | Gallery, creation, management with face data |
| π¬ Chat | LLM character conversation with message history |
| π¬ Engines | Live status for all 6 engines, per-stage timing |
| βοΈ Settings | Engine toggles, sliders, quality controls, HeyGen Mode preset |
One click. Turns everything on at max quality:
β 3D Face Β· β Talking Head Β· β Wav2Lip Β· β Eye Gaze Β· β Gestures Β· β Parallax BG Β· β Physics Β· β High Quality Enhancement
FastAPI (api.py) on port 8081. Auto-generated docs at /docs.
| Method | Path | What It Does |
|---|---|---|
| POST | /generate/talking-head |
Generate talking head video from audio + face image |
| POST | /animate/face |
Animate face with expression coefficients + head pose |
| WS | /ws/realtime |
Real-time streaming with head pose + eye state |
| POST | /config/render |
Configure any engine's render parameters |
| Method | Path | |
|---|---|---|
| GET | / |
API info |
| GET | /status |
System status with per-engine FPS |
| POST | /swap |
Face swap on uploaded image |
| POST | /chat |
Send message to character LLM |
| GET/POST/DELETE | /characters |
Character CRUD |
| POST | /characters/{name}/activate |
Activate character |
| POST/GET | /physics/config, /physics/status |
Physics control |
| POST/GET | /camera/source, /camera/status |
Camera control |
| WS | /ws/chat, /ws/video |
Streaming chat + video |
curl -X POST http://localhost:8081/generate/talking-head \
-H "Content-Type: application/json" \
-d '{
"audio_b64": "BASE64_ENCODED_WAV_AUDIO",
"face_b64": "BASE64_ENCODED_FACE_IMAGE",
"fps": 20
}'curl -X POST http://localhost:8081/config/render \
-H "Content-Type: application/json" \
-d '{"engine": "eye", "config": {"blink_interval_min": 1.5, "blink_interval_max": 4.0}}'# On Windows
pip install pyinstaller
python build_exe.py
# Output: dist/DeepFaceReal.exe + DeepFaceReal_API.exeThe build bundles all core modules, InsightFace models (buffalo_l, inswapper_128), MediaPipe models, Wav2Lip models, OpenCV/NumPy/Pillow/Streamlit/FastAPI, and a launcher batch file.
- Install IP Webcam from Google Play
- Tap Start Server
- Note the IP (e.g.
192.168.1.100:8080) - In Streamlit β π± Mobile tab β enter IP or scan QR code
sudo apt install v4l2loopback-dkms
sudo modprobe v4l2loopback devices=1 video_nr=10Start the pipeline β virtual camera appears as a device β select "DeepFakeCam" in WhatsApp Desktop, Zoom, or Meet.
DeepFaceReal-Physics/
βββ app.py # Streamlit UI v2 (port 8080)
βββ api.py # FastAPI v2 (port 8081)
βββ build_exe.py # Windows EXE builder
βββ start.sh # Launch both services
βββ core/
β βββ face_3d_engine.py # 3D face reconstruction + pose
β βββ talking_head.py # Audio-driven talking head
β βββ lip_sync.py # Wav2Lip lip sync
β βββ eye_engine.py # Eye & gaze
β βββ gesture_engine.py # Conversational gestures
β βββ pipeline.py # Real-time async pipeline
β βββ face_swapper.py # InsightFace swap
β βββ physics_engine.py # MediaPipe Holistic + physics
β βββ background_engine.py # Parallax background
β βββ webcam_pipeline.py # Camera capture
β βββ character_manager.py # Character profiles
β βββ llm_character.py # OpenRouter LLM
βββ models/ # Downloaded models
βββ profiles/ # Character profiles
βββ static/ # Static assets
| Engine | Target FPS | CPU Cores | Resolution |
|---|---|---|---|
| 3D Face | 30 FPS | 1 | 640Γ480 |
| Talking Head | 30 FPS | 1 | Audio only |
| Lip Sync | 20 FPS | 1 | Face region |
| Eye Gaze | 60 FPS | 0.5 | Overlay |
| Gestures | 30 FPS | 0.5 | Overlay |
| Pipeline Total | β₯15β20 FPS | 4 cores | 640Γ480 |
Every heavy stage processes every 2ndβ3rd frame. Repeated phonemes hit the Wav2Lip cache. Async queues keep everything non-blocking.
- Fork the repo
- Create a feature branch (
git checkout -b feature/your-feature) - Commit (
git commit -m 'Add your feature') - Push (
git push origin feature/your-feature) - Open a Pull Request
MIT License β see LICENSE for details.
Built on:
InsightFace β’ MediaPipe β’ Wav2Lip β’ ONNX Runtime β’ OpenRouter β’ Streamlit β’ FastAPI β’ SciPy β’ scikit-image
Inspired by SadTalker, Ditto, and HeyGen.
DeepFaceReal-Physics v2.0.0 Β· MIT License Β· DeathLegionTeamLK