Extrapolated Urban View Synthesis Benchmark

Xiangyu Han^{1, 3*} , Zhen Jia^1* , Boyi Li² , Yan Wang²
Boris Ivanovic² , Yurong You² , Lingjie Liu³ , Yue Wang^{2, 4}
Marco Pavone^{2, 5} , Chen Feng¹ , Yiming Li^{1, 2}

¹New York University ²NVIDIA ³University of Pennsylvania
⁴University of South California ⁵Stanford University
*equal contribution

TLDR: We build a comprehensive real-world benchmark for quantitatively and qualitatively evaluating extrapolated novel view synthesis in large-scale urban scenes.

📖 Abstract

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct quantitative and qualitative evaluations of state-of-the-art Gaussian Splatting methods across different difficulty levels. Our results show that Gaussian Splatting is prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We have released our data to help advance self-driving and urban robotics simulation technology.

🔊News

2024/12/9: Our paper is now available on arXiv!
2024/12/10: Our data is now available on Hugging Face!

Installation

We used multiple models and baselines. Please refer to the original repository for installation instructions and set up the necessary environments accordingly:

Tip: We recommend using PyTorch 2.0.1 and CUDA 11.8 for all environments, as they work well in our implementation.

Extract Masks

cd GroundedSAM2
conda activate groundsam

# Extract dynamic masks
python extract_masks.py --text-prompt "person. rider. car. truck. bus. train. motorcycle. bicycle." --input-dir <path to input_folder> --output-dir <path to output_folder>

# Extract sky masks
python extract_masks.py --text-prompt "sky." --input-dir <path to input_folder> --output-dir <path to output_folder>

Alternatively, you can run the Bash script extract_masks.sh to process images across multiple folders.

Data Structure

After extracting the masks, you'll find that the data structure is as follows:

<location>
|---test_set.txt
|---train_set.txt
|---images
|   |---<image 0>
|   |---<image 1>
|   |---...
|---sparse
    |---0
        |---cameras.bin
        |---images.bin
        |---points3D.bin
|---dynamic_masks
|   |---<image 0>
|   |---<image 1>
|   |---...
|---sky_masks
|   |---<image 0>
|   |---<image 1>
|   |---...
|---geo_registration
|   |---geo_registration.txt
|---poses
|   |---images.txt

Run 3DGS

# Define the path
source_path="path/to/your/data"
model_path="$source_path/models/3DGS"

# Train 3DGS with masked dynamic objects
python train.py -s "$source_path" -m "$model_path" --method "masked_3dgs"

Note that this command run the 3DGS with masked dynamic objects. If you want run the vanila version, please use --method "vanila"

Render

# Define the path
source_path="path/to/your/data"
model_path="$source_path/models/3DGS"

# Render
python render.py -m "$model_path"

Evaluation

# Define the path
source_path="path/to/your/data"
model_path="$source_path/models/3DGS"

# Evaluation
python metrics_with_dyn_masks.py -s "$source_path" -m "$model_path" -e "all"

You can find the metrics as .txt file under the $model_path with a file name test_set_results_w_mask.json and train_set_results_w_mask.json. The output metrics will be like following:

{
 "ours_30000": {
  "SSIM": 0.7512373236681191,
  "PSNR": 16.235809601201545,
  "LPIPS": 0.4492570964361398,
  "Cos_Similarity": 0.4056943528884359
 }
}

🗓️ TODO

[✔] Data release
[✔] Code release (Will keep updating baselines)

📊 Baseline Code

Here are the official code links for the baseline.

🖊️ Citation

If you find this project useful in your research, please consider cite:

@misc{han2024extrapolatedurbanviewsynthesis,
      title={Extrapolated Urban View Synthesis Benchmark}, 
      author={Xiangyu Han and Zhen Jia and Boyi Li and Yan Wang and Boris Ivanovic and Yurong You and Lingjie Liu and Yue Wang and Marco Pavone and Chen Feng and Yiming Li},
      year={2024},
      eprint={2412.05256},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05256}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
GroundedSAM2		GroundedSAM2
assets		assets
baselines/3DGS		baselines/3DGS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extrapolated Urban View Synthesis Benchmark

📖 Abstract

🔊News

Installation

Extract Masks

Data Structure

Run 3DGS

Render

Evaluation

🗓️ TODO

📊 Baseline Code

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ai4ce/EUVS-Benchmark

Folders and files

Latest commit

History

Repository files navigation

Extrapolated Urban View Synthesis Benchmark

📖 Abstract

🔊News

Installation

Extract Masks

Data Structure

Run 3DGS

Render

Evaluation

🗓️ TODO

📊 Baseline Code

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages