Authors: Zhaofeng Hu1†, Sifan Zhou2†*, Shibo Zhao3, Zhihang Yuan4
1Stony Brook University, 2Southeast University, 3Carnegie Mellon University, 4Houmo AI
† Equal contribution, *Corresponding author
Paper: arxiv
Video: Youtube
MVCTrack is an enhanced framework for **3D single object tracking (3D SOT)** in point clouds, designed to address the limitations of sparse and incomplete LiDAR data. Our approach introduces a **Multimodal-guided Virtual Cues Projection (MVCP)** scheme to enrich sparse point clouds by integrating RGB camera data, significantly improving tracking performance, particularly in scenarios with distant or small objects.
This repository provides the code for MVCTrack, which achieves state-of-the-art performance on the NuScenes dataset.
A novel, lightweight, and plug-and-play scheme that:
- Utilizes 2D object detection to generate virtual cues from RGB images.
- Projects dense 2D semantic information into 3D space to balance point cloud density.
- Enhances the sparsity and completeness of point clouds.
An end-to-end 3D SOT tracker that:
- Seamlessly integrates the MVCP scheme to improve tracking accuracy.
- Effectively balances point density distribution across different distances.
- Achieves competitive performance with minimal computational overhead.
- Evaluated on the large-scale NuScenes dataset.
- Significantly surpasses existing multi-modal 3D trackers.
- Demonstrates exceptional performance in sparse and occluded scenarios.
Here are the quick links to the detailed guides:
- Dataset Preparation: Instructions to prepare datasets like KITTI, NuScenes, and Waymo Open Dataset.
- Installation Guide: Step-by-step guide to set up the environment and install dependencies.
- Testing the Model: How to evaluate the pretrained model on your dataset.
- Training the Model: Guide to train the model on your dataset.
- Virtual Cues: Details about the Multimodal-Guided Virtual Cues Projection (MVCP) module.
Our implementation is based on Open3DSOT, BEVTrack, P2P, MMDetection3D, and MVP. Thanks for the great open-source work!
If any parts of our paper and code help your research, please consider citing our paper and giving a star to our repository:
@article{hu2024mvctrackboosting3dpoint,
title={MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues},
author={Zhaofeng Hu and Sifan Zhou and Shibo Zhao and Zhihang Yuan},
year={2024},
eprint={2412.02734},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.02734}
}
This repository is released under MIT License (see LICENSE file for details).