EmoVOCA (WACV 2025)

EmoVOCA: Speech-Driven Emotional 3D Talking Heads

This is the official repository of the WACV 2025 paper "EmoVOCA: Speech-Driven Emotional 3D Talking Heads" by Federico Nocentini, Claudio Ferrari, Stefano Berretti.

🔥🔥 [2025/01/25] Our code is now public available! Feel free to explore, use, and contribute!

Overview

The domain of 3D talking head generation has witnessed significant progress in recent years. A notable challenge in this field consists in blending speech-related motions with expression dynamics, which is primarily caused by the lack of comprehensive 3D datasets that combine diversity in spoken sentences with a variety of facial expressions. Whereas literature works attempted to exploit 2D video data and parametric 3D models as a workaround, these still show limitations when jointly modeling the two motions. In this work, we address this problem from a different perspective, and propose an innovative data-driven technique that we used for creating a synthetic dataset, called EmoVOCA, obtained by combining a collection of inexpressive 3D talking heads and a set of 3D expressive sequences. To demonstrate the advantages of this approach, and the quality of the dataset, we then designed and trained an emotional 3D talking head generator that accepts a 3D face, an audio file, an emotion label, and an intensity value as inputs, and learns to animate the audio-synchronized lip movements with expressive traits of the face. Comprehensive experiments, both quantitative and qualitative, using our data and generator evidence superior ability in synthesizing convincing animations, when compared with the best performing methods in the literature.

We introduce EmoVOCA, a novel approach for generating a synthetic 3D Emotional Talking Heads dataset which leverages speech tracks, intensity labels, emotion labels, and actor specifications. The proposed dataset can be used to surpass the lack of 3D datasets of expressive speech, and train more accurate emotional 3D talking head generators as compared to methods relying on 2D data as proxy.

Overview of our framework. Two distinct encoders process the talking and expressive 3D head displacements, separately, while a common decoder is trained to reconstruct them. At inference, talking and emotional heads are combined by concatenating the encoded latent vectors, and the decoder outputs a combination of their displacements.

Citation

@inproceedings{nocentini2024emovocaspeechdrivenemotional3d,
    title={EmoVOCA: Speech-Driven Emotional 3D Talking Heads}, 
    author={Federico Nocentini and Claudio Ferrari and Stefano Berretti},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year = {2025},
  }

EmoVOCA Installation Guide

This guide provides step-by-step instructions on how to set up the ScanTalk environment and install all necessary dependencies. The codebase has been tested on Ubuntu 20.04.2 LTS with Python 3.8.

1. Setting Up Conda Environment

It is recommended to use a Conda environment for this setup.

Create a Conda Environment
```
conda create -n emovoca python=3.8.18
```
Activate the Environment
```
conda activate emovoca
```

2. Install Mesh Processing Libraries

Clone the MPI-IS Repository

git clone https://github.com/MPI-IS/mesh.git

cd mesh

Modify line 7 of the Makefile to avoid error

@pip install --no-deps --config-settings="--boost-location=$$BOOST_INCLUDE_DIRS" --verbose --no-cache-dir .

Run the MakeFile
```
make all
```

2. Installing PyTorch and Requirements

Ensure you have the correct version of PyTorch and torchvision. If you need a different CUDA version, please refer to the official PyTorch website.

Install PyTorch, torchvision, and torchaudio

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia

Install Requirements
```
pip install -r requirements.txt
```

EmoVOCA

For training and testing EmoVOCA DE-SD, we used two open-source datasets for 3D facial data: vocaset and Florence 4D Facial Expression Dataset. Please note that you must obtain authorization to use both datasets.

To generate meshes with EmoVOCA, follow these steps:

Download the vocaset dataset and place it in the Dataset folder located in the main directory.
The meshes used for conditioning vocaset have already been added to the EmoVOCA_generator/New_Conditions folder. For additional data, download the Florence 4D dataset.

Pre-generated EmoVOCAv2 sequences are available here. To use them:

Download the folder and place it inside the Dataset folder in the main directory.
Extract all the files to ensure proper access.

Pretrained Models Installation

We are releasing three models:

emovoca_generator.tar: The DE-SD framework used to generate EmoVOCA.
es2l.tar: The ES2L framework trained on EmoVOCA.
es2d.tar: The ES2D framework trained on EmoVOCA.

All models are available for download here. After downloading, place the saves folder inside each model's directory to ensure proper setup.

Training, Testing and Demo

Inside the model folders ES2L and ES2D, you will find both the model definitions and the training code for each component.

In the EmoVOCA_generator folder, you will find the code required to generate any version of EmoVOCA.

Within the main directory, there is a file named demo.py, which can be used to render outputs based on an emotion label, intensity value, an audio file, and a 3D face template. Additionally, example files for generation are provided in the example folder located in the main directory.

Authors

Federico Nocentini
Claudio Ferrari
Stefano Berretti

LICENSE

All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EmoVOCA (WACV 2025)

EmoVOCA: Speech-Driven Emotional 3D Talking Heads

Overview

Citation

EmoVOCA Installation Guide

1. Setting Up Conda Environment

2. Install Mesh Processing Libraries

2. Installing PyTorch and Requirements

EmoVOCA

Pretrained Models Installation

Training, Testing and Demo

Authors

LICENSE

Files

README.md

Latest commit

History

README.md

File metadata and controls

EmoVOCA (WACV 2025)

EmoVOCA: Speech-Driven Emotional 3D Talking Heads

Overview

Citation

EmoVOCA Installation Guide

1. Setting Up Conda Environment

2. Install Mesh Processing Libraries

2. Installing PyTorch and Requirements

EmoVOCA

Pretrained Models Installation

Training, Testing and Demo

Authors

LICENSE