Skip to content

cy94/ExCap3D

Repository files navigation

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Paper | Dataset | Video

teaser

Chandan Yeshwanth, Dávid Rozenberszki, Angela Dai

ICCV 2025


Contents


Dataset

  1. Apply for and download the ScanNet++ dataset from here
  2. Download the captions for the train and val sets of the ExCap3D dataset here

Data preparation

Data preparation and training code is based on Mask3D.

Prepare semantics training data using our ScanNet++ toolbox. Then sample the PTH files to get fewer points on the mesh surface, add new segment data, and convert to Mask3D format.

./sample_pth.sh

Training

First train the instance segmentation model on the ScanNet++ dataset. Configure all the paths in the training script, and run:

./scripts/train_spp.sh

Then train the captioning model on the ExCap3D dataset.

./scripts/train_spp_caption_joint.sh

Evaluation

Evaluate the trained model.

./scripts/eval_spp_caption_joint.sh

Citation

If you find our code, dataset or paper useful, please consider citing

@inproceedings{yeshwanth2025excap3d,
  title={ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail},
  author={Yeshwanth, Chandan and Rozenberszki, David and Dai, Angela},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

About

[ICCV 2025] Expressive 3D Scene Understanding via Object Captioning with Varying Detail

Topics

Resources

License

Stars

Watchers

Forks