Installation | Train | Evaluation | FLOPs |
This repository builds upon MMHuman3D, an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and computer graphics. MMHuman3D is a part of the OpenMMLab project. The main branch works with PyTorch 1.7+.
These features will be contributed to MMHuman3D at a later date.
We have added multiple major features on top of MMHuman3D.
- Benchmarks on 31 datasets
- Benchmarks on 11 dataset combinations
- Benchmarks on 9 backbones and different initialisation
- Benchmarks on 9 augmentation techniques
- Provide trained models on optimal configurations for inference
- Evaluation on 5 test sets
- FLOPs calculation
Additional:
- Train annotation files for 31 datasets will be provided in the future
- Future works can easily obtain benchmarks on HMR for baseline comparison on their selected dataset mixes and partition using our provided pipeline and annotation files.
Supported datasets:
(click to collapse)
- AGORA (CVPR'2021)
- AI Challenger (ICME'2019)
- COCO (ECCV'2014)
- COCO-WholeBody (ECCV'2020)
- EFT-COCO-Part (3DV'2021)
- EFT-COCO (3DV'2021)
- EFT-LSPET (3DV'2021)
- EFT-OCHuman (3DV'2021)
- EFT-PoseTrack (3DV'2021)
- EFT-MPII (3DV'2021)
- Human3.6M (TPAMI'2014)
- InstaVariety (CVPR'2019)
- LIP (CVPR'2017)
- LSP (BMVC'2010)
- LSP-Extended (CVPR'2011)
- MPI-INF-3DHP (3DC'2017)
- MPII (CVPR'2014)
- MTP (CVPR'2021)
- MuCo-3DHP (3DV'2018)
- MuPoTs-3D (3DV'2018)
- OCHuman (CVPR'2019)
- 3DOH50K (CVPR'2020)
- Penn Action (ICCV'2012)
- 3D-People (ICCV'2019)
- PoseTrack18 (CVPR'2018)
- PROX (ICCV'2019)
- 3DPW (ECCV'2018)
- SURREAL (CVPR'2017)
- UP-3D (CVPR'2017)
- VLOG (CVPR'2019)
- CrowdPose (CVPR'2019)
Please refer to datasets.md for training configs and results.
- Benchmarks on different dataset combinations
(click to collapse)
- Mix 1: H36M, MI, COCO
- Mix 2: H36M, MI, EFT-COCO
- Mix 3: H36M, MI, EFT-COCO, MPII
- Mix 4: H36M, MuCo, EFT-COCO
- Mix 5: H36M, MI, COCO, LSP, LSPET, MPII
- Mix 6: EFT-[COCO, MPII, LSPET], SPIN-MI, H36M
- Mix 7: EFT-[COCO, MPII, LSPET], MuCo, H36M, PROX
- Mix 8: EFT-[COCO, PT, LSPET], MI, H36M
- Mix 9: EFT-[COCO, PT, LSPET, OCH], MI, H36M
- Mix 10: PROX, MuCo, EFT-[COCO, PT, LSPET, OCH], UP-3D, MTP, Crowdpose
- Mix 11: EFT-[COCO, MPII, LSPET], MuCo, H36M
Please refer to mixed-datasets.md for training configs and results.
(click to collapse)
- ResNet-50, -101, -152 (CVPR'2016)
- ResNeXt (CVPR'2017)
- HRNet (CVPR'2019)
- EfficientNet
- ViT
- Swin
- Twins
Please refer to backbone.md for training configs and results.
We find that transfering knowledge from a pose estimation model gives more competitive performance.
Initialised backbones:
(click to collapse)
- ResNet-50 ImageNet (default)
- ResNet-50 MPII
- ResNet-50 COCO
- HRNet-W32 ImageNet
- HRNet-W32 MPII
- HRNet-W32 COCO
- Twins-SVT ImageNet
- Twins-SVT MPII
- Twins-SVT COCO
Please refer to backbone.md for training configs and results.
New augmentations:
(click to collapse)
- Coarse dropout
- Grid dropout
- Photometric distortion
- Random crop
- Hard erasing
- Soft erasing
- Self-mixing
- Synthetic occlusion
- Synthetic occlusion over keypoints
Please refer to augmentation.md for training configs and results.
We find that training with L1 loss gives more competitive performance. Please refer to mixed-datasets-l1.md for training configs and results.
We provide trained models from the optimal configurations for download and inference. Please refer to combine.md for training configs and results.
Dataset | Backbone | 3DPW (PA-MPJPE) | Download |
---|---|---|---|
H36M, MI, COCO, LSP, LSPET, MPII | ResNet-50 | 51.66 | model |
H36M, MI, COCO, LSP, LSPET, MPII | HRNet-W32 | 49.18 | model |
H36M, MI, COCO, LSP, LSPET, MPII | Twins-SVT | 48.77 | model |
H36M, MI, COCO, LSP, LSPET, MPII | Twins-SVT | 47.70 | model |
EFT-[COCO, LSPET, MPII], H36M, SPIN-MI | HRNet-W32 | 47.68 | model |
EFT-[COCO, LSPET, MPII], H36M, SPIN-MI | Twins-SVT | 47.31 | model |
H36M, MI, EFT-COCO | HRNet-W32 | 48.08 | model |
H36M, MI, EFT-COCO | Twins-SVT | 48.27 | model |
H36M, MuCo, EFT-COCO | Twins-SVT | 47.92 | model |
We benchmarked our major findings on several algorithms and hope to add more in the future. Please refer to algorithms.md for training configs and logs.
(click to collapse)
- SPIN
- GraphCMR
- PARE
- Mesh Graphormer
General set-up instructions follow that of MMHuman3d. Please refer to install.md for installation.
python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --no-validate
Example: using 1 GPU to train HMR.
python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --gpus 1 --no-validate
If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_train.sh
.
./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} --no-validate
Common optional arguments include:
--resume-from ${CHECKPOINT_FILE}
: Resume from a previous checkpoint file.--no-validate
: Whether not to evaluate the checkpoint during training.
Example: using 8 GPUs to train HMR on a slurm cluster.
./tools/slurm_train.sh my_partition my_job configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr 8 --no-validate
You can check slurm_train.sh for full arguments and environment variables.
There's five benchmarks for evaluation:
- 3DPW-test (P2)
- H36m-test (P2)
- EFT-COCO-val
- EFT-LSPET-test
- EFT-OCHuman-test
python tools/test.py ${CONFIG} --work-dir=${WORK_DIR} ${CHECKPOINT} --metrics=${METRICS}
Example:
python tools/test.py configs/hmr/resnet50_hmr_pw3d.py --work-dir=work_dirs/hmr work_dirs/hmr/latest.pth --metrics pa-mpjpe mpjpe
If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_test.sh
.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG} ${WORK_DIR} ${CHECKPOINT} --metrics ${METRICS}
Example:
./tools/slurm_test.sh my_partition test_hmr configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr work_dirs/hmr/latest.pth 8 --metrics pa-mpjpe mpjpe
tools/get_flops.py
is a script adapted from flops-counter.pytorch and MMDetection to compute the FLOPs and params of a given model.
python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
You will get the results like this.
==============================
Input shape: (3, 1280, 800)
Flops: 239.32 GFLOPs
Params: 37.74 M
==============================
Note: This tool is still experimental and we do not guarantee that the number is absolutely correct. You may well use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
- FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 224, 224).
- Some operators are not counted into FLOPs like GN and custom operators. Refer to
mmcv.cnn.get_model_complexity_info()
for details.
If you find our work useful for your research, please consider citing the paper:
@inproceedings{
title={Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms},
author={Pang, Hui En and Cai, Zhongang and Yang, Lei and Zhang, Tianwei and Liu, Ziwei},
booktitle={NeurIPS},
year={2022}
}
Distributed under the S-Lab License. See LICENSE
for more information.
This study is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
Explore More SMPLCap Projects
- [arXiv'25] SMPLest-X: An extended version of SMPLer-X with stronger foundation models.
- [ECCV'24] WHAC: World-grounded human pose and camera estimation from monocular videos.
- [CVPR'24] AiOS: An all-in-one-stage pipeline combining detection and 3D human reconstruction.
- [NeurIPS'23] SMPLer-X: Scaling up EHPS towards a family of generalist foundation models.
- [NeurIPS'23] RoboSMPLX: A framework to enhance the robustness of whole-body pose and shape estimation.
- [ICCV'23] Zolly: 3D human mesh reconstruction from perspective-distorted images.
- [arXiv'23] PointHPS: 3D HPS from point clouds captured in real-world settings.
- [NeurIPS'22] HMR-Benchmarks: A comprehensive benchmark of HPS datasets, backbones, and training strategies.