HumanGif: Single-View Human Diffusion with Generative Prior

Shoukang Hu¹; Takuya Narihira¹; Kazumi Fukuda¹; Ryosuke Sawata¹; Takashi Shibuya¹; Yuki Mitsufuji^1,2

¹Sony AI ²Sony Group Corporation

Framework

News

2025/02/27: 🌟🌟🌟Training and Inference source code released.

Installation

System requirement: Ubuntu22.04, Cuda 12.1
Tested GPUs: H100

Create conda environment:

  conda create -n humangif python=3.10
  conda activate humangif

Install packages with pip

  conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
  pip install -r requirements.txt

Download pretrained models

You can easily get all pretrained models required by inference from our HuggingFace repo.

Clone the the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:

git lfs install
git clone https://huggingface.co/Sony/humangif

Or you can download them separately from their source repo:

HumanGif ckpts: Consist of denoising UNet, guidance encoders, Reference UNet, and motion module.
StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
image_encoder: Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (Thanks to lambdalabs)

Finally, these pretrained models should be organized as follows:

./pretrained_models/
|-- humangif
    |-- RenderPeople
        |-- stage1_w_normal_w_nerf_guid_w_img_loss/saved_models/
            |-- guidance_encoder_normal-140000.pth
            |-- guidance_encoder_nerf-140000.pth
            |-- NeRF_renderer-140000.pth
            |-- reference_unet-140000.pth
            |-- denoising_unet-140000.pth
        |-- stage2_w_normal_w_nerf_guid_w_img_loss_w_view_attention/saved_models/
            |-- view_module.pth
        |-- stage3_w_normal_w_nerf_guid_w_img_loss_w_view_attention_w_motion_attention/saved_models/
            |-- motion_module.pth
    |-- DNA_Rendering
        |-- stage1_w_normal_w_nerf_guid_w_img_loss/saved_models/
            |-- guidance_encoder_normal-150000.pth
            |-- guidance_encoder_nerf-150000.pth
            |-- NeRF_renderer-150000.pth
            |-- reference_unet-150000.pth
            |-- denoising_unet-150000.pth
        |-- stage2_w_normal_w_nerf_guid_w_img_loss_w_view_attention/saved_models/
            |-- view_module.pth
        |-- stage3_w_normal_w_nerf_guid_w_img_loss_w_view_attention_w_motion_attention/saved_models/
            |-- motion_module.pth
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Download SMPL Models

Register and download SMPL (version 1.0.0) and SMPLX (version 1.0) models. Put the downloaded models in the folder smpl_models. The folder structure should look like

./
├── ...
└── assets/
    ├── SMPL_NEUTRAL.pkl
    ├── SMPL_FEMALE.pkl
    ├── SMPL_MALE.pkl
    ├── models/smplx/
        ├── SMPLX_NEUTRAL.npz
        ├── SMPLX_FEMALE.npz
        ├── SMPLX_MALE.npz

Prepare Datasets

RenderPeople Dataset

Please download the rendered multi-view images of RenderPeople dataset from SHERF.

Unzip the downloaded dataset under data/ directory and run

python data_processing_script/prepare_renderpeople_folder.py
python data_processing_script/prepare_renderpeople_smpl.py --root_dir data/RenderPeople/train -s 0 -e 450
python data_processing_script/prepare_renderpeople_smpl.py --root_dir data/RenderPeople/test -s 450 -e 482

Use data processing scripts from Champ to render normal images

python pkgs/pipelines/smpl_pipe_renderpeople.py -i ${HumanGif_folder}/data/RenderPeople/train/ --skip_fit -s 0 -e 450
python pkgs/pipelines/smpl_pipe_renderpeople.py -i ${HumanGif_folder}/data/RenderPeople/test/ --skip_fit -s 450 -e 482

The directory structure after data preprocessing should be like this:

/RenderPeople/train(test)/
|-- subject01/          # A subject data frame
|   |-- camera0000
|       |-- images/       # images frame sequance
|       |-- msk/          # msk frame sequance
|       |-- normal/       # Normal map frame sequance
|   `-- ...
|-- subject02/
|   |-- ...
|   `-- ...
`-- subjectN/
|-- ...
`-- ...

DNA_Rendering Dataset

Please download DNA-Rendering dataset from Download.

Put the downloaded dataset under data/ directory and run

python data_processing_script/prepare_dna_rendering_part_1_smpl.py -s 0 -e 38
python data_processing_script/prepare_dna_rendering_part_2_smpl.py -s 0 -e 389
python data_processing_script/prepare_dna_rendering_folder.py

Use data processing scripts from Champ to render normal images

python pkgs/pipelines/smpl_pipe_dna_rendering.py -i ${HumanGif_folder}/data/DNA_Rendering/Part_1/data_render/ --skip_fit -s 0 -e 38
python pkgs/pipelines/smpl_pipe_dna_rendering.py -i ${HumanGif_folder}/data/DNA_Rendering/Part_2/data_render/ --skip_fit -s 0 -e 389

The directory structure after data preprocessing should be like this:

/DNA_Rendering/train(test)/
|-- subject01/          # A subject data frame
|   |-- camera0000
|       |-- images/       # Normal map frame sequance
|       |-- normal/       # Normal map frame sequance
|   `-- ...
|-- subject02/
|   |-- ...
|   `-- ...
`-- subjectN/
|-- ...
`-- ...

Select another small batch of data as the validation set, and modify the validation.ref_images and validation.guidance_folders roots in training config yaml.

🏃‍♀️ Inference from checkpoint

Inference code with RenderPeople dataset

# Run inference script of novel view
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/inference/RenderPeople/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_module_w_motion_module_nv.yaml

# Run inference script of novel Pose
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/inference/RenderPeople/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_module_w_motion_module_np.yaml

Inference code with DNA_Rendering dataset

# Run inference script of novel view
accelerate launch train_s3_DNA_Rendering_w_nerf_w_img_loss.py --config configs/inference/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_nv.yaml

# Run inference script of novel Pose
python eval_long_video_DNA_Rendering.py --config configs/inference/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_np.yaml

Training

Training code with RenderPeople dataset

# Run training script of stage1
accelerate launch train_s1_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage1_RenderPeople_w_normal_w_nerf_w_img_loss.yaml

# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage2_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_attention.yaml

# Modify the `stage1_ckpt_dir` and `view_module_path` value in yaml and run training script of stage3
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage3_RenderPeople_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention.yaml

Training code with DNA_Rendering dataset

# Run training script of stage1
accelerate launch train_s1_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage1_DNA_Rendering_w_normal_w_nerf_w_img_loss.yaml

# Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2
accelerate launch train_s2_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage2_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention.yaml

# Modify the `stage1_ckpt_dir` and `view_module_path` value in yaml and run training script of stage3
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/train/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention.yaml

Inference

Inference code with RenderPeople dataset

# Run inference script of novel view
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/test/RenderPeople/stage3_RenderPeople_w_view_module_w_motion_module_nv.yaml

# Run inference script of novel Pose
accelerate launch train_s3_RenderPeople_w_nerf_w_img_loss.py --config configs/test/RenderPeople/stage3_RenderPeople_w_view_module_w_motion_module_np.yaml

Inference code with DNA_Rendering dataset

# Run inference script of novel view
accelerate launch train_s3_DNA_Rendering_w_nerf_w_img_loss.py --config configs/test/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_nv.yaml

# Run inference script of novel Pose
python eval_long_video_DNA_Rendering --config configs/test/DNA_Rendering/stage3_DNA_Rendering_w_normal_w_nerf_w_img_loss_w_view_attention_w_motion_attention_np.yaml

Evaluation Metric

Follow evaluation scripts from DISCO to calculate metrics

# Run inference script of novel view
bash gen_eval_nv.sh $folder

# Run inference script of novel Pose
bash gen_eval_nv.sh $folder

Citation

If you find our work useful for your research, please consider citing the paper:

```bibtex
@article{hu2025humangif,
  title={HumanGif: Single-View Human Diffusion with Generative Prior},
  author={Hu, Shoukang and Narihira, Takuya and Fukuda, Kazumi and Sawata, Ryosuke and Shibuya, Takashi and Mitsufuji, Yuki},
  journal={arXiv preprint arXiv:2502.12080},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HumanGif: Single-View Human Diffusion with Generative Prior

Framework

News

Installation

Download pretrained models

Download SMPL Models

Prepare Datasets

RenderPeople Dataset

DNA_Rendering Dataset

🏃‍♀️ Inference from checkpoint

Inference code with RenderPeople dataset

Inference code with DNA_Rendering dataset

Training

Training code with RenderPeople dataset

Training code with DNA_Rendering dataset

Inference

Inference code with RenderPeople dataset

Inference code with DNA_Rendering dataset

Evaluation Metric

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets_img		assets_img
configs		configs
data_processing_script		data_processing_script
models		models
pipelines		pipelines
pkgs		pkgs
smpl		smpl
smplx		smplx
tool		tool
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
eval_long_video_DNA_Rendering.py		eval_long_video_DNA_Rendering.py
gen_eval_np.sh		gen_eval_np.sh
gen_eval_nv.sh		gen_eval_nv.sh
requirements.txt		requirements.txt
train_s1_DNA_Rendering_w_nerf_w_img_loss.py		train_s1_DNA_Rendering_w_nerf_w_img_loss.py
train_s1_RenderPeople_w_nerf_w_img_loss.py		train_s1_RenderPeople_w_nerf_w_img_loss.py
train_s2_DNA_Rendering_w_nerf_w_img_loss.py		train_s2_DNA_Rendering_w_nerf_w_img_loss.py
train_s2_RenderPeople_w_nerf_w_img_loss.py		train_s2_RenderPeople_w_nerf_w_img_loss.py
train_s3_DNA_Rendering_w_nerf_w_img_loss.py		train_s3_DNA_Rendering_w_nerf_w_img_loss.py
train_s3_RenderPeople_w_nerf_w_img_loss.py		train_s3_RenderPeople_w_nerf_w_img_loss.py

License

sony/humangif

Folders and files

Latest commit

History

Repository files navigation

HumanGif: Single-View Human Diffusion with Generative Prior

Framework

News

Installation

Download pretrained models

Download SMPL Models

Prepare Datasets

RenderPeople Dataset

DNA_Rendering Dataset

🏃‍♀️ Inference from checkpoint

Inference code with RenderPeople dataset

Inference code with DNA_Rendering dataset

Training

Training code with RenderPeople dataset

Training code with DNA_Rendering dataset

Inference

Inference code with RenderPeople dataset

Inference code with DNA_Rendering dataset

Evaluation Metric

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages