CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Jingpei Lu*, Zekai Liang*, Tristin Xie, Florian Ritcher, Shan Lin, Sainan Liu, Michael C. Yip

University of California, San Diego

ICRA 2025

Highlight

CtRNet-X is a novel framework capable of estimating the robot pose with partially visible robot manipulators. Our approach leverages the Vision-Language Models for fine-grained robot components detection, and integrates it into a keypoint-based pose estimation network, which enables more robust performance in varied operational conditions.

Dependencies

Recommend set up the environment using Anaconda. Code is developed and tested on Ubuntu 22.04.

Python(3.8)
Numpy(1.22.4)
PyTorch(1.10.0)
torchvision(0.11.1)
pytorch3d(0.6.2)
Kornia(0.6.3)
Transforms3d(0.3.1)
pyzed (https://www.stereolabs.com/docs/app-development/python/install)

More details see environment.yml.

Dataset

Weights

Weights for fine-tuned CLIP model

Weights for Camera-to-Robot estimation

Quick Start

Inference DROID raw data:

python inference_DROID_raw_file.py

Inference panda dataset with ground truth camera info:

python inference_panda_dataset.py

Optional args: confidence_threshold

Working with RGBD input

A variation of CtRNet-X can integrate depth maps from an RGB-D camera during inference by comparing measured depth to rendered depth from the differentiable renderer. Here we use DROID as example.

Use depth input to refine estimation:

python inference_video_depth.py

We use Huber loss with delta 0.1, feel free to try your own depth data with different losses!

Citation

@article{lu2024ctrnet,
  title={CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera},
  author={Lu, Jingpei and Liang, Zekai and Xie, Tristin and Ritcher, Florian and Lin, Shan and Liu, Sainan and Yip, Michael C},
  journal={arXiv preprint arXiv:2409.10441},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
assets		assets
evaluation		evaluation
imageloaders		imageloaders
models		models
ros_nodes		ros_nodes
urdfs/Panda		urdfs/Panda
.gitignore		.gitignore
CLIP_forward.py		CLIP_forward.py
CLIP_utils.py		CLIP_utils.py
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Evaluate_DROID.py		Evaluate_DROID.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
frame_info_parser.py		frame_info_parser.py
inference_DROID_raw_file.py		inference_DROID_raw_file.py
inference_panda_dataset.py		inference_panda_dataset.py
inference_video_depth.py		inference_video_depth.py
package.xml		package.xml
setup.py		setup.py
svo_reader.py		svo_reader.py
utils.py		utils.py
video_to_frames.py		video_to_frames.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Highlight

Dependencies

Dataset

Weights

Quick Start

Working with RGBD input

Citation

About

Releases

Packages

Languages

License

darthandvader/CtRNet-X

Folders and files

Latest commit

History

Repository files navigation

CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Highlight

Dependencies

Dataset

Weights

Quick Start

Working with RGBD input

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages