WHALES (Wireless enHanced Autonomous vehicles with Large number of Engaged agentS) is a CARLA-based cooperative perception dataset averaging 8.4 agents per sequence. It captures diverse viewpoints, agent behaviors, and multitask interactions to study scheduling, perception, and planning under realistic multi-agent constraints.
- 2025-11-21 – Released WHALES dataset v1.0 with cooperative scheduling benchmarks.
- 2025-06-17 – WHALES was accepted by IROS 2025!
- Dataset Overview
- Getting Started
- Training & Evaluation
- Visualization
- Scheduling Algorithms
- Experimental Results
- Roadmap
- Citation
- Largest agent count: 8.4 agents per scene with synchronized LiDAR-camera suites.
- Rich annotations: 2.01M 3D boxes plus full agent-behavior recording.
- Scheduling-ready: Provides perception, planning, and communication metadata for agent selection research.
- Plug-in friendly: Ships with
mmdetection3d-compatible configs and hooks for custom schedulers.
| Dataset | Year | Real/Simulated | V2X | Image | Point Cloud | 3D Annotations | Classes | Avg. Agents |
|---|---|---|---|---|---|---|---|---|
| KITTI | 2012 | Real | No | 15k | 15k | 200k | 8 | 1 |
| nuScenes | 2019 | Real | No | 1.4M | 400k | 1.4M | 23 | 1 |
| DAIR-V2X | 2021 | Real | V2V&I | 39k | 39k | 464k | 10 | 2 |
| V2X-Sim | 2021 | Simulated | V2V&I | 0 | 10k | 26.6k | 2 | 2 |
| OPV2V | 2022 | Simulated | V2V | 44k | 11k | 230k | 1 | 3 |
| DOLPHINS | 2022 | Simulated | V2V&I | 42k | 42k | 293k | 3 | 3 |
| V2V4Real | 2023 | Real | V2V | 40k | 20k | 240k | 5 | 2 |
| WHALES (Ours) | 2024 | Simulated | V2V&I | 70k | 17k | 2.01M | 3 | 8.4 |
| Location | Category | Sensors | Planning & Control | Tasks | Spawning |
|---|---|---|---|---|---|
| On-road | Uncontrolled CAV | LiDAR ×1 + Camera ×4 | CARLA autopilot | Perception | Random / deterministic |
| On-road | Controlled CAV | LiDAR ×1 + Camera ×4 | RL policy | Perception & planning | Random / deterministic |
| Roadside | RSU | LiDAR ×1 + Camera ×4 | RL policy | Perception & planning | Static |
| Anywhere | Obstacle agent | – | CARLA autopilot | – | Random |
- Clone the repository:
git clone https://github.com/chensiweiTHU/WHALES.git
- Create and activate a Conda environment:
conda create -n whales python=3.10 -y conda activate whales
- Install WHALES:
pip install -e . - Install
mmdetection3d==0.17.1following the official guide. - (Optional) Install OpenCOOD for additional cooperative baselines.
- Download the full dataset from Google Drive: Download Whales.
- Place extracted files under
./data/whales/. - Preprocess:
This emits, under
python tools/create_data.py whales --root-path ./data/whales/ --out-dir ./data/whales/ --extra-tag whales
./data/whales/:whales_infos_{train,val}.pkl— LiDAR info PKLs forWhalesDataset.whales_infos_{train,val}_mono3d.coco.json— per-camera mono3D COCO files forWhalesMonoDataset(cam-only training).whales_dbinfos_train.pkl+whales_gt_database/— GT-sampling database used by LiDAR configs' augmentation step.
Configs are organised as:
./configs/_base_/— shared dataset, model, and schedule bases../configs/standalone/— single-agent baselines (PointPillars, SECOND, CenterPoint, FCOS3D, VoxelNeXt, etc. on LiDAR and monocular 3D)../configs/cooperative/— V2X cooperative-perception recipes (PointPillars, VoxelNeXt, BEVFusion, FCooper, V2VNet, V2X-ViT, OPV2V, FFNet, plus the scheduling studies).
Pick any leaf config under those trees and run:
- Training
bash tools/dist_train.sh <config>.py <gpu_num>
- Testing
bash tools/dist_test.sh <config>.py <model>.pth <gpu_num> --eval bbox
Both WhalesDataset (LiDAR) and WhalesMonoDataset (monocular 3D) are registered; the COCO JSONs emitted by the preprocessing step drive the mono3D path, the info PKLs drive the LiDAR path. Metrics: mAP and NDS.
tools/misc/visualize_whales.py can render from all three data representations (raw frame_info.json, info PKL, mono3D COCO):
# Raw CARLA frame_info.json: reconstruct ego-frame boxes + overlay on the 4 cameras + BEV.
python tools/misc/visualize_whales.py frame_info \
--path data/whales/<scene>/<frame>/frame_info.json --agent vehicle0
# Single entry from the info PKL (one agent-frame).
python tools/misc/visualize_whales.py pkl \
--path data/whales/whales_infos_val.pkl --token <scene>_<frame>_<agent>
# Batched renders: 2x2 camera grid alongside the BEV, one frame per scene.
python tools/misc/visualize_whales.py pkl_grid \
--pkls data/whales/whales_infos_{train,val}.pkl \
--num-per-pkl 20 --one-per-scene --out whales_vis/
# Mono3D COCO renders with 2D bbox + 3D wireframe per annotation.
python tools/misc/visualize_whales.py coco \
--path data/whales/whales_infos_val_mono3d.coco.json --image-id <image_id>
python tools/misc/visualize_whales.py coco_batch \
--path data/whales/whales_infos_val_mono3d.coco.json \
--num-tokens 20 --one-per-scene --out whales_vis_coco/Agent scheduling pipelines live in ./mmdet3d_plugin/datasets/pipelines/cooperative_perception.py.
CAHS prioritizes collaborators by historical coverage and predicted gains.
All numbers below are reported as 50m / 100m, the two evaluation ranges used by the WHALES protocol (per-class radial distance from ego).
| Method | AP_Veh ↑ | AP_Ped ↑ | AP_Cyc ↑ | mAP ↑ |
|---|---|---|---|---|
| PointPillars | 67.1 / 41.5 | 38.0 / 6.3 | 37.3 / 11.6 | 47.5 / 19.8 |
| SECOND | 58.5 / 38.8 | 27.1 / 12.1 | 24.1 / 12.9 | 36.6 / 21.2 |
| RegNet | 66.9 / 42.3 | 38.7 / 8.4 | 32.9 / 11.7 | 46.2 / 20.8 |
| VoxelNeXt | 64.7 / 42.3 | 52.2 / 27.4 | 35.9 / 9.0 | 50.9 / 26.2 |
| Method | AP_Veh ↑ | AP_Ped ↑ | AP_Cyc ↑ | mAP ↑ |
|---|---|---|---|---|
| No Fusion | 67.1 / 41.5 | 38.0 / 6.3 | 37.3 / 11.6 | 47.5 / 19.8 |
| F-Cooper | 75.4 / 52.8 | 50.1 / 9.1 | 44.7 / 20.4 | 56.8 / 27.4 |
| Raw-level Fusion | 71.3 / 48.9 | 38.1 / 8.5 | 40.7 / 16.3 | 50.0 / 24.6 |
| VoxelNeXt | 71.5 / 50.6 | 60.1 / 35.4 | 47.6 / 21.9 | 59.7 / 35.9 |
mAP at 50m / 100m. Base detector: VoxelNeXt (LiDAR cooperative). Rows = inference-time policy, columns = training-time policy.
| Inference \ Training | No Fusion | Closest First | Single Random | Multiple Random | Full Communication |
|---|---|---|---|---|---|
| No Fusion (Baseline) | 50.9 / 26.2 | 50.9 / 23.3 | 51.3 / 25.3 | 50.3 / 22.9 | 45.6 / 18.8 |
| Closest First | 39.9 / 20.3 | 58.4 / 30.2 | 58.3 / 32.6 | 57.3 / 30.5 | 55.4 / 10.8 |
| Single Random | 43.3 / 22.8 | 57.9 / 31.0 | 58.4 / 33.3 | 57.7 / 31.4 | 55.0 / 14.6 |
| MASS | 55.5 / 11.0 | 58.8 / 33.7 | 58.9 / 34.0 | 57.3 / 32.3 | 54.1 / 27.4 |
| CAHS (Proposed) | 56.1 / 29.6 | 62.5 / 31.7 | 62.7 / 35.9 | 58.3 / 32.6 | 59.9 / 31.0 |
mAP at 50m / 100m. Base detector: VoxelNeXt (LiDAR cooperative). Same axes as above.
| Inference \ Training | No Fusion | Closest First | Single Random | Multiple Random | Full Communication |
|---|---|---|---|---|---|
| Multiple Random | 34.5 / 16.9 | 60.7 / 35.1 | 61.2 / 37.1 | 61.4 / 36.4 | 58.8 / 12.9 |
| Full Communication | 29.1 / 10.5 | 63.7 / 38.4 | 63.7 / 39.1 | 64.0 / 41.1 | 65.1 / 39.2 |
| MASS | 54.6 / 13.4 | 64.9 / 39.7 | 65.0 / 40.5 | 63.7 / 40.4 | 63.5 / 36.4 |
| CAHS (Proposed) | 53.7 / 14.2 | 65.3 / 40.1 | 65.1 / 42.0 | 63.9 / 40.6 | 65.2 / 39.2 |
- Publish dataset and checkpoints on HuggingFace.
@INPROCEEDINGS{11247472,
author = {Wang, Yinsong Richard and Chen, Siwei and Song, Ziyi and Zhou, Sheng},
title = {{WHALES: A Multi-Agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving}},
booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2025},
pages = {20487-20493},
keywords = {Wireless communication; Three-dimensional displays; Scalability; Whales; Benchmark testing; Metadata; Scheduling; Vehicle dynamics; Vehicle-to-everything; Autonomous vehicles},
doi = {10.1109/IROS60139.2025.11247472}
}