Chandan Yeshwanth, Dávid Rozenberszki, Angela Dai
ICCV 2025
- Apply for and download the ScanNet++ dataset from here
- Download the captions for the train and val sets of the ExCap3D dataset here
Data preparation and training code is based on Mask3D.
Prepare semantics training data using our ScanNet++ toolbox. Then sample the PTH files to get fewer points on the mesh surface, add new segment data, and convert to Mask3D format.
./sample_pth.sh
First train the instance segmentation model on the ScanNet++ dataset. Configure all the paths in the training script, and run:
./scripts/train_spp.sh
Then train the captioning model on the ExCap3D dataset.
./scripts/train_spp_caption_joint.sh
Evaluate the trained model.
./scripts/eval_spp_caption_joint.sh
If you find our code, dataset or paper useful, please consider citing
@inproceedings{yeshwanth2025excap3d,
title={ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail},
author={Yeshwanth, Chandan and Rozenberszki, David and Dai, Angela},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
