Pre-Trained Vision Transformer for Robot Vision Tasks

Research Project at CMU Robotics

This Pre-Trained Vision Transformer explores how neural networks truly learn information and proposes a novel training method to improve generalization across multiple robot vision tasks — optical flow, disparity, and depth estimation.

Concept

The project removes the task-specific head from a Vision Transformer (ViT) and introduces a new training algorithm focused on matching the feature maps of the feature-extractor portion of the network. This forces the ViT to learn robust pixel-matching representations during pre-training instead of relying on task-specific optimizations.

Methodology

Model Redesign: The task-specific head at the end of the ViT was removed.
Feature Matching Objective: Training was reformulated around matching internal feature maps between datasets.
Dataset Conversion: A custom method was developed to convert optical flow datasets into compatible datasets for this training process.
Generalization Hypothesis: By emphasizing feature-level consistency over task-specific loss, the network learns more general and transferable representations.

The Unimatch models serves as the base model for this project:

@article{xu2023unifying, title={Unifying Flow, Stereo and Depth Estimation}, author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2023} }

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.vscode		.vscode
__pycache__		__pycache__
checkpoints_flow/chairs-gmflow-scale1		checkpoints_flow/chairs-gmflow-scale1
dataloader		dataloader
demo		demo
loss		loss
output		output
scripts		scripts
test_images		test_images
unimatch		unimatch
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
DATASETS.md		DATASETS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
conda_environment.yml		conda_environment.yml
datasets		datasets
evaluate_depth.py		evaluate_depth.py
evaluate_flow.py		evaluate_flow.py
evaluate_stereo.py		evaluate_stereo.py
main_depth.py		main_depth.py
main_downsampled_flow.py		main_downsampled_flow.py
main_flow.py		main_flow.py
main_matching.py		main_matching.py
main_stereo.py		main_stereo.py
pip_install.sh		pip_install.sh
pretrained		pretrained
ta_data_spec.txt		ta_data_spec.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pre-Trained Vision Transformer for Robot Vision Tasks

Concept

Methodology

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

castacks/robovision-pretrained-transformer

Folders and files

Latest commit

History

Repository files navigation

Pre-Trained Vision Transformer for Robot Vision Tasks

Concept

Methodology

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages