3D Human Pose Estimation using LSTM and Transformer based models

A powerful and flexible framework for 3D human pose estimation, leveraging the strengths of LSTM and transformer-based networks, inspired by the state-of-the-art research in the field.

Acknowledgements

A significant portion of the code related to data processing and visualization is derived from the following outstanding projects:

Big shoutout to the contributors of these projects for their exceptional work!

Environment Setup

This project has been developed and tested with the following environment:

Python: 3.9
PyTorch: 1.13.0
CUDA: 11.7

To set up your environment, follow these steps:

conda create -n 3dposenet python=3.9
conda activate 3dposenet
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Dataset preparation: Human3.6M

Please refer to VideoPose3D to set up the Human3.6M dataset as follows:

code_root/
└── data/
	├── data_2d_h36m_gt.npz
	├── data_2d_h36m_cpn_ft_h36m_dbb.npz
	└── data_3d_h36m.npz

Training

You can train our model on a single GPU with the following command:

python train.py

The training script includes several configurable parameters, allowing you to experiment with different setups. The current configuration is as follows:

batch_size = 512
num_input_frames = 81
num_epoch = 15
lr = 0.0001
model_pos = LSTM_PoseNet(num_joints, num_frames=receptive_field, input_dim=2, output_dim=3)

checkpoint = 'checkpoint'

The model processes 81 frames at a time, dividing a video into overlapping windows of 81 frames. But feel free to experiment on this parameter.

Video Demo

First, you need to download the pretrained weights for YOLOv3 (here), HRNet (here) and put them in the ./demo/lib/checkpoint directory. Then, put your in-the-wild videos in the ./demo/video directory.

Show correct checkpoint path (from the trained model) in vis.py and Run the command below:

python demo/vis.py --video sample_video.mp4

Evaluation

Our models achieved the following performance on the Human3.6M benchmark using the MPJPE evaluation metric:

•	LSTM_PoseNet: 55 mm
•	Transformer-based models: 64 mm

The current state-of-the-art (SOTA) performance on this benchmark is around 30 mm, as reported in this paper (here)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
common		common
demo		demo
images		images
loss_curves		loss_curves
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Human Pose Estimation using LSTM and Transformer based models

Acknowledgements

Environment Setup

Dataset preparation: Human3.6M

Training

Video Demo

Evaluation

About

Releases

Packages

Languages

bargav25/3D-Human-Pose-Estimation

Folders and files

Latest commit

History

Repository files navigation

3D Human Pose Estimation using LSTM and Transformer based models

Acknowledgements

Environment Setup

Dataset preparation: Human3.6M

Training

Video Demo

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages