Skip to content

🍳 [arXiv'24] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

License

Notifications You must be signed in to change notification settings

chengzhag/PanSplat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PanSplat

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai

teaser

Introduction

This repo contains training, testing, evaluation code of our arXiv 2024 paper.

Installation

We use Anaconda to manage the environment. You can create the environment by running the following command:

conda create -n pansplat python=3.10
conda activate pansplat
pip install torch==2.4.0+cu118 torchvision==0.19.0+cu118 torchaudio==2.4.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install git+https://github.com/dcharatan/diff-gaussian-rasterization-modified
pip3 install -U xformers==0.0.27.post2+cu118 --index-url https://download.pytorch.org/whl/cu118

We use wandb to log and visualize the training process. You can create an account then login to wandb by running the following command:

wandb login

Quick Demo on Synthetic Data

You can download the pretrained checkpoints last.ckpt (trained on the Matterport3D dataset at 512 × 1024 resolution) and put it in the logs/nvpl49ge/checkpoints folder. Then run the following command to test the model:

python -m src.paper.demo +experiment=pansplat-512 ++model.weights_path=logs/nvpl49ge/checkpoints/last.ckpt mode=predict

The code will use the sample images in the datasets/pano_grf folder:

demo_input_image1 demo_input_image2

The output will be saved in the folder with the format outputs/2025-01-13/16-56-04: demo_output_video

Additionally, we provide a fine-tuned checkpoint last.ckpt (fine-tuned on the Matterport3D dataset at 2048 × 4096 resolution) for 4K panorama synthesis. You can put it in the logs/hxlad5nq/checkpoints folder and run the following command to test the model:

python -m src.paper.demo +experiment=pansplat-2048 ++model.weights_path=logs/hxlad5nq/checkpoints/last.ckpt mode=predict

This requires a GPU with at least 24GB of memory, e.g., NVIDIA RTX 3090.

Data Preparation

PanoGRF Data

We use the data preparation code from the PanoGRF repo to render the Matterport3D dataset and generate the Replica and Residential datasets. Please download pano_grf_lr.tar from link and unzip it to the datasets folder. We also rendered a smaller Matterport3D dataset with higher resolution for fine-tuning. If you plan to fine-tune the model at higher resolution, please download pano_grf_hr.tar and unzip it to the datasets folder.

360Loc Data

We use the 360Loc dataset for fine-tuning to real-world data. Please download the data from the official link and unzip the separate parts to the datasets/360Loc folder.

Our Video Data

We provide two sample videos for testing cross-dataset generalization. Please download insta360.tar from link and unzip it to the datasets folder.

Use your own video...

We use stella_vslam, a community fork of xdspacelab/openvslam, to extract the camera poses from self-captured videos. You can follow the official guide to install the stella_vslam. We recommend installing with SocketViewer and set up the SocketViewer for visualizing the SLAM process on a remote server. Then change to the build directory of stella_vslam following this link and download the ORB vocabulary:

curl -sL "https://github.com/stella-cv/FBoW_orb_vocab/raw/main/orb_vocab.fbow" -o orb_vocab.fbow

After that, please put your video in a separate folder under the datasets/insta360 folder and rename it to video.mp4. You can run the following command under the directory of video folder to run SLAM mapping:

~/lib/stella_vslam_examples/build/run_video_slam -v ~/lib/stella_vslam_examples/build/orb_vocab.fbow -m video.mp4 -c ../equirectangular.yaml --frame-skip 1 --no-sleep --map-db-out map.msg --viewer socket_publisher --eval-log-dir ./ --auto-term

Finally, you can run the following command to extract the camera poses by running localization only:

~/lib/stella_vslam_examples/build/run_video_slam --disable-mapping -v ~/lib/stella_vslam_examples/build/orb_vocab.fbow -m video.mp4 -c ../equirectangular.yaml --frame-skip 1 --no-sleep --map-db-in map.msg --viewer socket_publisher --eval-log-dir ./ --auto-term

The camera poses will be saved in the frame_trajectory.txt file. You can then follow the Demo on Real-World Data section using the insta360 dataset command to test the model on your own video.


Training and Testing

Pretrained Models

We use part of the pretrained UniMatch weights from MVSplat and the pretrained panoramic monocular depth estimation model from PanoGRF. Please download the weights and put them in the checkpoints folder.

Train on Matterport3D

We train the model on the Matterport3D dataset starting from a low resolution and fine-tune it at higher resolutions. If you are looking to fine-tune the model on 360Loc dataset, you can stop at the 512 × 1024 resolution. Or instead, you can skip this part by downloading the pretrained checkpoints last.ckpt and put it in the logs/nvpl49ge/checkpoints folder.

Please first run the following command to train the model at 256 × 512 resolution:

python -m src.main +experiment=pansplat-256 mode=train

Hint: The training takes about 1 day on a single NVIDIA A100 GPU. Experiments are logged and visualized to wandb under the pansplat project. You'll get a WANDB_RUN_ID (e.g., ek6ab466) after running the command. Or you can find it in the wandb dashboard. At the end of the training, the model will be tested and the evaluation results will be logged to wandb as table. The checkpoints are saved in the logs/<WANDB_RUN_ID>/checkpoints folder. Same for the following experiments.

Please then replace the model.weights_path parameter of config/pansplat-512.yaml with the path to the last checkpoint of the 256 × 512 resolution training and run the following command to fine-tune the model at 512 × 1024 resolution:

python -m src.main +experiment=pansplat-512 mode=train
If you want to fine-tune on high resolution Matterport3D data...

Similarly, update the model.weights_path settings in config/pansplat-1024.yaml and fine-tune the model at 1024 × 2048 resolution:

python -m src.main +experiment=pansplat-1024 mode=train

Finally, update the model.weights_path settings in config/pansplat-2048.yaml and fine-tune the model at 2048 × 4096 resolution:

python -m src.main +experiment=pansplat-2048 mode=train

Fine-tune on 360Loc

We fine-tune the model on the 360Loc dataset from the weights trained on the Matterport3D dataset at 512 × 1024 resolution. If you want to skip this part, you can find the checkpoints here. We provide checkpoints for 512 × 1024 (ls933m5x) and 2048 × 4096 (115k3hnu) resolutions.

Please update the model.weights_path parameter of config/pansplat-512-360loc.yaml to the path of the last checkpoint of the Matterport3D training at 512 × 1024 resolution, then run the following command:

python -m src.main +experiment=pansplat-512-360loc mode=train

We then gradually increase the resolution to 1024 × 2048 and 2048 × 4096 and fine-tune from the lower resolution weights:

python -m src.main +experiment=pansplat-1024-360loc mode=train
python -m src.main +experiment=pansplat-2048-360loc mode=train

Remember to update the model.weights_path parameter in the corresponding config files before running the commands.

Demo on Real-World Data

First please make sure you have followed the steps in the Fine-tune on 360Loc section to have the checkpoints ready. You can then test the model on the 360Loc or Insta360 dataset by running the following command:

python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt mode=predict
python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt mode=predict dataset=insta360

Hint: You can replace the model.weights_path parameter with what you have fine-tuned.

The output will be saved in the folder with the format outputs/2025-01-13/16-56-04: atrium-daytime_360_1-50_53 VID_20240914_103257_00_005-9930_9946

For the 2048 × 4096 resolution model, you can run the following command:

python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt mode=predict
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt mode=predict dataset=insta360

Additionally, we provide commands for longer image sequences inputs:

python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt
python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt dataset=insta360
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt dataset=insta360

Example output: VID_20240922_102141_00_006-21456-21616

Citation

If you find our work helpful, please consider citing:

@misc{zhang2024pansplat4kpanoramasynthesis,
      title={PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting}, 
      author={Cheng Zhang and Haofei Xu and Qianyi Wu and Camilo Cruz Gambardella and Dinh Phung and Jianfei Cai},
      year={2024},
      eprint={2412.12096},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.12096}, 
}

About

🍳 [arXiv'24] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages