The official implementation of "Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer"
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer
Ziyang Chen,Wenting Li, Yongjun Zhang✱, Bingshu Wang, Yabo Wu, Yong Zhao, C. L. Philip Chen
arXiv Report
Contact us: [email protected]; [email protected]✱
@article{chen2025hart,
title={Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer},
author={Chen, Ziyang and Zhang, Yongjun and Li, Wenting and Wang, Bingshu and Wu, Yabo and Zhao, Yong and Chen, CL},
journal={arXiv preprint arXiv:2501.01023},
year={2025}
}Python = 3.8
CUDA = 11.3
conda create -n hart python=3.8
conda activate hart
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txtTo evaluate/train our HART, you will need to download the required datasets.
- Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
- Middlebury
- ETH3D
- KITTI
- TartanAir
- Falling Things (fat.zip)
- CARLA
- CREStereo Dataset
- InStereo2K
- Sintel Stereo
- ETH3D
By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder
├── datasets
├── FlyingThings3D
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_finalpass
├── disparity
├── Driving
├── frames_finalpass
├── disparity
├── KITTI
├── KITTI_2015
├── testing
├── training
├── KITTI_2012
├── testing
├── training
├── Middlebury
├── MiddEval3
├── trainingF
├── trainingH
├── trainingQ
├── official_train.txt
├── 2005
├── 2006
├── 2014
├── 2021
├── ETH3D
├── two_view_training
├── two_view_training_gt
├── two_view_testing
├── TartanAir
├── fat
├── crestereo
├── HR-VS
├── carla-highres
├── InStereo2K
"official_train.txt" is available at here.
bash ./scripts/train.shTo evaluate a trained model on a validation set (e.g. Middlebury full resolution), run
python evaluate_stereo.py --restore_ckpt models/hart_sceneflow.pth --dataset middlebury_FWeight is available here.
- This project borrows the code from STTR, DLNR, IGEV, MoCha-Stereo. We thank the original authors for their excellent works!
- This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (QianKeHe[2024]Key001).
- This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (Project No. [2023]159).
