Skip to content

Improving 3D Large Language Model via Robust Instruction Tuning

Notifications You must be signed in to change notification settings

WeitaiKang/Robin3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

We present Robin3D, a state-of-the-art 3D Large Language Model trained on large-scale instruction-following data generated by our novel Robust Instruction Generation (RIG) data engine. To handle our RIG-generated complex data, our Robin3D further enhances its spatial understanding by Relation-Augmented Projector and improves the object referring and grounding ability by ID-Feature Bonding.

News

[2024.09] We release Robin3D [paper][code], a new SOTA 3D LLM for 3D scenes.

πŸ”₯ Robin3D vs Previous Methods

performance

πŸ”¨ Preparation

  • Prepare the environment:

    conda create -n robin3d python=3.9.17
    conda activate robin3d
    conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
  • Download LLM backbone:

    • We use Vicuna-7B v1.5 in our experiments, which can be downloaded from Hugging Face.
  • Annotations and extracted features:

    Please follow the instructions in Chat-Scene's Preparation.

πŸ€– Training and Inference

  • Coming soon.

πŸ“„ Citation

Our paper has disappeared from Google Scholar, and we don't know why. We have emailed the Google Scholar team but have not received a response yet.

If you find our work useful in your research, please consider citing:

@misc{kang2025robin3dimproving3dlarge,
      title={Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning}, 
      author={Weitai Kang and Haifeng Huang and Yuzhang Shang and Mubarak Shah and Yan Yan},
      year={2025},
      eprint={2410.00255},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.00255}, 
}

Stay tuned for our project. πŸ”₯

If you have any questions or suggestions, feel free to drop us an email ([email protected]) or open an issue.

😊 Acknowledgement

Thanks to the open source of the following projects:

LLMs: LLaMA, Vicuna,

3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer, Grounded-3DLLM, Chat-Scene

Detectors: Mask3D,

Representations: Uni3D, DINOv2

3D Models: OpenScene

About

Improving 3D Large Language Model via Robust Instruction Tuning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published