Lihan Jiang*, Yucheng Mao*, Linning Xu,
Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, Bo Dai†
Our code relies on Python 3.10+, and is developed based on PyTorch 2.2.0 and CUDA 12.1, but it should work with other Pytorch/CUDA versions as well.
- Clone AnySplat.
git clone https://github.com/OpenRobotLab/AnySplat.git
cd AnySplat
- Create the environment, here we show an example using conda.
conda create -y -n anysplat python=3.10
conda activate anysplat
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
from pathlib import Path
import torch
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from src.misc.image_io import save_interpolated_video
from src.model.model.anysplat import AnySplat
from src.utils.image import process_image
# Load the model from Hugging Face
model = AnySplat.from_pretrained("lhjiang/anysplat")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
for param in model.parameters():
param.requires_grad = False
# Load and preprocess example images (replace with your own image paths)
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
images = [process_image(image_name) for image_name in image_names]
images = torch.stack(images, dim=0).unsqueeze(0).to(device) # [1, K, 3, 448, 448]
b, v, _, h, w = images.shape
# Run Inference
gaussians, pred_context_pose = model.inference((images+1)*0.5)
pred_all_extrinsic = pred_context_pose['extrinsic']
pred_all_intrinsic = pred_context_pose['intrinsic']
save_interpolated_video(pred_all_extrinsic, pred_all_intrinsic, b, h, w, gaussians, image_folder, model.decoder)
# single node:
python src/main.py +experiment=dl3dv trainer.num_nodes=1
# multi nodes:
export GPU_NUM=8
export NUM_NODES=2
torchrun \
--nnodes=$NUM_NODES \
--nproc_per_node=$GPU_NUM \
--rdzv_id=test \
--rdzv_backend=c10d \
--rdzv-endpoint=$MASTER_ADDR:$MASTER_PORT \
-m src.main +experiment=multi-dataset +hydra.job.config.store_config=false
Here, we provide three example datasets (CO3Dv2, DL3DV and ScanNet++), each representing a different training view sampling strategy. You can use them as templates and add any other datasets you prefer.
python src/post_opt/simple_trainer.py default --data_dir ...
# Novel View Synthesis
python src/eval_nvs.py --data_dir ...
# Pose Estimation
python src/eval_pose.py --co3d_dir ... --co3d_anno_dir ...
We use the original data from the DL3DV datasets. For other datasets, please follow CUT3R's data preprocessing instructions to prepare training data.
python demo_gradio.py
This will automatically download the pre-trained model weights and config from Hugging Face Model.
The demo is a Gradio interface where you can upload images or a video and visualize the reconstructed 3D Gaussian Splat, along with the rendered RGB and depth videos. The trajectory of the rendered video is obtained by interpolating the estimated input image poses.
If you find our work helpful, please consider citing:
@article{jiang2025anysplat,
title={AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views},
author={Jiang, Lihan and Mao, Yucheng and Xu, Linning and Lu, Tao and Ren, Kerui and Jin, Yichen and Xu, Xudong and Yu, Mulin and Pang, Jiangmiao and Zhao, Feng and others},
journal={arXiv preprint arXiv:2505.23716},
year={2025}
}
We thank all authors behind these repositories for their excellent work: VGGT, NoPoSplat, CUT3R and gsplat.