Hao Li*, Shuai Yang*, Yilun Chen, Yang Tian, Xiaoda Yang, Xinyi Chen,
Hanqing Wang, Tai Wang, Feng Zhao, Dahua Lin, Jiangmiao Pang
* Equal Contributions
University of Science and Technology of China, Shanghai Artificial Intelligence Laboratory,
Zhejiang University, The Chinese University of Hong Kong
CronusVLA is a unified framework that extends single-frame VLA models to the multi-frame paradigm. It including three key components: single-frame pretraining, multi-frame encoding and cross-frame decoding. An action adaptation mechanism is proposed to improve model finetuning performance.
CronusVLA achieves efficient inference, state-of-the-art performance on SimplerEnv, and considerable improvement over OpenVLA on LIBERO. Real-world Franka experiments also show the strong performance and robustness.
- Release the post-trained checkpoints and evaluation code of SimplerEnv (7B & 0.5B).
- Release the finetuned checkpoints and evaluation code of LIBERO.
- Release the other checkpoints and training code for post-training and finetuning.
- More powerful CronusVLA v2.0.
@misc{2506.19816,
Author = {Hao Li and Shuai Yang and Yilun Chen and Yang Tian and Xiaoda Yang and Xinyi Chen and Hanqing Wang and Tai Wang and Feng Zhao and Dahua Lin and Jiangmiao Pang},
Title = {CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation},
Year = {2025},
Eprint = {arXiv:2506.19816},
}