VLASH is an efficient and easy-to-use framework for VLAs fine-tuning and inference.
VLASH is efficient through:
- Asynchronous inference for fast reaction and smooth motion in real-time (>30Hz inference frequency for
$\pi_{0.5}$ on RTX 5090) - Future-state-awareness to enable stable asynchronous VLA inference without overhead
- Action quantization for faster robot execution speed
- LoRA with shared observation encoding for efficient fine-tuning on consumer GPUs
VLASH is easy to use with:
- Seamless integration with LeRobot datasets (v2.1, v3.0), models and robots
- Simple YAML-based configuration system
- Support for various policy architectures (e.g.,
$\pi_{0.5}$ ,$\pi_0$ , ...) - Easy deployment on real robot hardware
VLASH_demos.mp4
conda create -n "vlash" python=3.10
conda activate vlash
conda install ffmpeg=7.1.1 -c conda-forge
pip install -e .Fine-tune a VLA policy for your task, enabling smooth async inference without overhead:
vlash train examples/train/pi05/async.yamlRun async inference on a robot:
vlash run examples/inference/async.yamlRun async inference with 2x speedup:
vlash run examples/inference/sync.yaml --action_quant_ratio=2- LoRA fine-tuning for
$\pi_{0.5}$ ,$\pi_0$ under 12G GPU memory - QLoRA fine-tuning for
$\pi_{0.5}$ ,$\pi_0$ under 8G GPU memory - Efficient fine-tuning with shared observation
This project is built upon the following excellent open-source projects: LeRobot, PEFT.
Apache 2.0