🚀 ICRA2025: MVCTrack: Boosting 3D Point Cloud Tracking 🚀

Authors: Zhaofeng Hu^1†, Sifan Zhou^2†, Shibo Zhao³, Zhihang Yuan⁴
¹Stony Brook University, ²Southeast University, ³Carnegie Mellon University, ⁴Houmo AI
† Equal contribution, Corresponding author
Paper: arxiv Video: Youtube

Overview

MVCTrack is an enhanced framework for **3D single object tracking (3D SOT)** in point clouds, designed to address the limitations of sparse and incomplete LiDAR data. Our approach introduces a **Multimodal-guided Virtual Cues Projection (MVCP)** scheme to enrich sparse point clouds by integrating RGB camera data, significantly improving tracking performance, particularly in scenarios with distant or small objects.

This repository provides the code for MVCTrack, which achieves state-of-the-art performance on the NuScenes dataset.

Key Features

Multimodal-guided Virtual Cues Projection (MVCP)

A novel, lightweight, and plug-and-play scheme that:

Utilizes 2D object detection to generate virtual cues from RGB images.
Projects dense 2D semantic information into 3D space to balance point cloud density.
Enhances the sparsity and completeness of point clouds.

MVCTrack Framework

An end-to-end 3D SOT tracker that:

Seamlessly integrates the MVCP scheme to improve tracking accuracy.
Effectively balances point density distribution across different distances.
Achieves competitive performance with minimal computational overhead.

State-of-the-Art Performance

Evaluated on the large-scale NuScenes dataset.
Significantly surpasses existing multi-modal 3D trackers.
Demonstrates exceptional performance in sparse and occluded scenarios.

Quick Start

Here are the quick links to the detailed guides:

Dataset Preparation: Instructions to prepare datasets like KITTI, NuScenes, and Waymo Open Dataset.
Installation Guide: Step-by-step guide to set up the environment and install dependencies.
Testing the Model: How to evaluate the pretrained model on your dataset.
Training the Model: Guide to train the model on your dataset.
Virtual Cues: Details about the Multimodal-Guided Virtual Cues Projection (MVCP) module.

Acknowledgement

Our implementation is based on Open3DSOT, BEVTrack, P2P, MMDetection3D, and MVP. Thanks for the great open-source work!

Citation

If any parts of our paper and code help your research, please consider citing our paper and giving a star to our repository:

@article{hu2024mvctrackboosting3dpoint,
  title={MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues},
  author={Zhaofeng Hu and Sifan Zhou and Shibo Zhao and Zhihang Yuan},
  year={2024},
  eprint={2412.02734},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2412.02734}
}

License

This repository is released under MIT License (see LICENSE file for details).

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
configs		configs
datasets		datasets
figures		figures
models		models
notes		notes
virtual_cues		virtual_cues
README.md		README.md
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 ICRA2025: MVCTrack: Boosting 3D Point Cloud Tracking 🚀

Authors: Zhaofeng Hu^1†, Sifan Zhou^2†, Shibo Zhao³, Zhihang Yuan⁴
¹Stony Brook University, ²Southeast University, ³Carnegie Mellon University, ⁴Houmo AI
† Equal contribution, Corresponding author
Paper: arxiv Video: Youtube

Overview

Key Features

Multimodal-guided Virtual Cues Projection (MVCP)

MVCTrack Framework

State-of-the-Art Performance

Quick Start

Acknowledgement

Citation

License

About

Releases

Packages

Contributors 2

Languages

StiphyJay/MVCTrack

Folders and files

Latest commit

History

Repository files navigation

🚀 ICRA2025: MVCTrack: Boosting 3D Point Cloud Tracking 🚀

Authors: Zhaofeng Hu1†, Sifan Zhou2†*, Shibo Zhao3, Zhihang Yuan4 1Stony Brook University, 2Southeast University, 3Carnegie Mellon University, 4Houmo AI † Equal contribution, *Corresponding author Paper: arxiv Video: Youtube

Overview

Key Features

Multimodal-guided Virtual Cues Projection (MVCP)

MVCTrack Framework

State-of-the-Art Performance

Quick Start

Acknowledgement

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Authors: Zhaofeng Hu^1†, Sifan Zhou^2†, Shibo Zhao³, Zhihang Yuan⁴
¹Stony Brook University, ²Southeast University, ³Carnegie Mellon University, ⁴Houmo AI
† Equal contribution, Corresponding author
Paper: arxiv Video: Youtube

Packages