2025/02/27
: πππTraining and Inference source code released.
- System requirement: Ubuntu22.04, Cuda 12.1
- Tested GPUs: H100
Create conda environment:
conda create -n humangif python=3.10
conda activate humangif
Install packages with pip
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
You can easily get all pretrained models required by inference from our HuggingFace repo.
Clone the the pretrained models into ${PROJECT_ROOT}/pretrained_models
directory by cmd below:
git lfs install
git clone https://huggingface.co/Sony/humangif
Or you can download them separately from their source repo:
- HumanGif ckpts: Consist of denoising UNet, guidance encoders, Reference UNet, and motion module.
- StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
- sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
- image_encoder: Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (Thanks to lambdalabs)
Finally, these pretrained models should be organized as follows:
./pretrained_models/
|-- humangif
|-- RenderPeople
|-- stage1_w_normal_w_nerf_guid_w_img_loss/saved_models/
|-- guidance_encoder_normal-140000.pth
|-- guidance_encoder_nerf-140000.pth
|-- NeRF_renderer-140000.pth
|-- reference_unet-140000.pth
|-- denoising_unet-140000.pth
|-- stage2_w_normal_w_nerf_guid_w_img_loss_w_view_attention/saved_models/
|-- view_module.pth
|-- stage3_w_normal_w_nerf_guid_w_img_loss_w_view_attention_w_motion_attention/saved_models/
|-- motion_module.pth
|-- DNA_Rendering
|-- stage1_w_normal_w_nerf_guid_w_img_loss/saved_models/
|-- guidance_encoder_normal-150000.pth
|-- guidance_encoder_nerf-150000.pth
|-- NeRF_renderer-150000.pth
|-- reference_unet-150000.pth
|-- denoising_unet-150000.pth
|-- stage2_w_normal_w_nerf_guid_w_img_loss_w_view_attention/saved_models/
|-- view_module.pth
|-- stage3_w_normal_w_nerf_guid_w_img_loss_w_view_attention_w_motion_attention/saved_models/
|-- motion_module.pth
|-- image_encoder
| |-- config.json
| `-- pytorch_model.bin
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
|-- feature_extractor
| `-- preprocessor_config.json
|-- model_index.json
|-- unet
| |-- config.json
| `-- diffusion_pytorch_model.bin
`-- v1-inference.yaml
Register and download SMPL (version 1.0.0) and SMPLX (version 1.0) models. Put the downloaded models in the folder smpl_models. The folder structure should look like
./
βββ ...
βββ assets/
βββ SMPL_NEUTRAL.pkl
βββ SMPL_FEMALE.pkl
βββ SMPL_MALE.pkl
βββ models/smplx/
βββ SMPLX_NEUTRAL.npz
βββ SMPLX_FEMALE.npz
βββ SMPLX_MALE.npz
Please download the rendered multi-view images of RenderPeople dataset from SHERF.
Unzip the downloaded dataset under data/ directory and run
python data_processing_script/prepare_renderpeople_folder.py
python data_processing_script/prepare_renderpeople_smpl.py --root_dir data/RenderPeople/train -s 0 -e 450
python data_processing_script/prepare_renderpeople_smpl.py --root_dir data/RenderPeople/test -s 450 -e 482
Use data processing scripts from Champ to render normal images
python pkgs/pipelines/smpl_pipe_renderpeople.py -i ${HumanGif_folder}/data/RenderPeople/train/ --skip_fit -s 0 -e 450
python pkgs/pipelines/smpl_pipe_renderpeople.py -i ${HumanGif_folder}/data/RenderPeople/test/ --skip_fit -s 450 -e 482
The directory structure after data preprocessing should be like this:
/RenderPeople/train(test)/
|-- subject01/ # A subject data frame
| |-- camera0000
| |-- images/ # images frame sequance
| |-- msk/ # msk frame sequance
| |-- normal/ # Normal map frame sequance
| `-- ...
|-- subject02/
| |-- ...
| `-- ...
`-- subjectN/
|-- ...
`-- ...
Please download DNA-Rendering dataset from Download.
Put the downloaded dataset under data/ directory and run
python data_processing_script/prepare_dna_rendering_part_1_smpl.py -s 0 -e 38
python data_processing_script/prepare_dna_rendering_part_2_smpl.py -s 0 -e 389
python data_processing_script/prepare_dna_rendering_folder.py
Use data processing scripts from Champ to render normal images
python pkgs/pipelines/smpl_pipe_dna_rendering.py -i ${HumanGif_folder}/data/DNA_Rendering/Part_1/data_render/ --skip_fit -s 0 -e 38
python pkgs/pipelines/smpl_pipe_dna_rendering.py -i ${HumanGif_folder}/data/DNA_Rendering/Part_2/data_render/ --skip_fit -s 0 -e 389
The directory structure after data preprocessing should be like this:
/DNA_Rendering/train(test)/
|-- subject01/ # A subject data frame
| |-- camera0000
| |-- images/ # Normal map frame sequance
| |-- normal/ # Normal map frame sequance
| `-- ...
|-- subject02/
| |-- ...
| `-- ...
`-- subjectN/
|-- ...
`-- ...
Select another small batch of data as the validation set, and modify the validation.ref_images
and validation.guidance_folders
roots in training config yaml.
# Run inference script of novel view
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/inference/RenderPeople/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_module_w_motion_module_nv.yaml
# Run inference script of novel Pose
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/inference/RenderPeople/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_module_w_motion_module_np.yaml
# Run inference script of novel view
accelerate launch train_s3_DNA_Rendering_w_nerf_w_img_loss.py --config configs/inference/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_nv.yaml
# Run inference script of novel Pose
python eval_long_video_DNA_Rendering.py --config configs/inference/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_np.yaml
# Run training script of stage1
accelerate launch train_s1_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage1_RenderPeople_w_normal_w_nerf_w_img_loss.yaml
# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage2_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_attention.yaml
# Modify the `stage1_ckpt_dir` and `view_module_path` value in yaml and run training script of stage3
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention.yaml
# Run training script of stage1
accelerate launch train_s1_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage1_DNA_Rendering_w_normal_w_nerf_w_img_loss.yaml
# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage2_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention.yaml
# Modify the `stage1_ckpt_dir` and `view_module_path` value in yaml and run training script of stage3
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention.yaml
# Run inference script of novel view
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/test/RenderPeople/stage3_RenderPeople_w_view_module_w_motion_module_nv.yaml
# Run inference script of novel Pose
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/test/RenderPeople/stage3_RenderPeople_w_view_module_w_motion_module_np.yaml
# Run inference script of novel view
accelerate launch train_s3_DNA_Rendering_w_nerf_w_img_loss.py --config configs/test/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_nv.yaml
# Run inference script of novel Pose
python eval_long_video_DNA_Rendering --config configs/test/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_np.yaml
Follow evaluation scripts from DISCO to calculate metrics
# Run inference script of novel view
bash gen_eval_nv.sh $folder
# Run inference script of novel Pose
bash gen_eval_nv.sh $folder
If you find our work useful for your research, please consider citing the paper:
```bibtex
@article{hu2025humangif,
title={HumanGif: Single-View Human Diffusion with Generative Prior},
author={Hu, Shoukang and Narihira, Takuya and Fukuda, Kazumi and Sawata, Ryosuke and Shibuya, Takashi and Mitsufuji, Yuki},
journal={arXiv preprint arXiv:2502.12080},
year={2025}
}