Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions cookbooks/cosmos3/generator/action/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ visualize the generated videos:
inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano.
- [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID.


## Run with vLLM-Omni

### Quickstart
Expand Down Expand Up @@ -135,7 +134,9 @@ To reproduce our post-training recipe for [Cosmos3-Nano-Policy-DROID](https://hu
launch-script pattern as the other Cosmos3 finetune cookbooks while delegating
the canonical training implementation to Cosmos Framework.


The same [action-policy SFT cookbook](./finetune/README.md) also covers **LIBERO-10**
(`launch_sft_action_policy_libero.sh`) — fine-tuning Cosmos3-Nano on the `libero_10`
simulation benchmark with the same launch-script pattern.

## TODO

Expand Down
39 changes: 35 additions & 4 deletions cookbooks/cosmos3/generator/action/finetune/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# Cosmos3-Nano-Policy-DROID Fine-Tuning (SFT)
# Cosmos3-Nano Action-Policy Fine-Tuning (SFT)

This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into an action policy for the DROID robot. It reproduces the post-training recipe used to create [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID), leveraging the public [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) dataset and the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework).
This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result:

- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID): trained on real-robot DROID data, evaluated on the RoboLab simulation benchmark.
- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 results: trained and evaluated on the LIBERO-10 simulation benchmark.

| Recipe | Launch shell | Base model | Dataset |
| --- | --- | --- | --- |
| Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split |
| Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` |

The DROID recipe uses the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.

The recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment. It trains a DROID policy model with `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**.

## Prerequisites

Expand Down Expand Up @@ -38,7 +44,7 @@ The launcher is a complete local wrapper for the public cookbook:
- downloads `Wan2.2_VAE.pth` if needed
- converts `Cosmos3-Nano` to a local DCP checkpoint if needed
- downloads `keep_ranges_1_0_1.json` if needed
- launches 8-GPU training with `action_policy_droid_repro.toml`
- launches training with `action_policy_droid_repro.toml`

The script intentionally stays close to the `cosmos-framework` example launcher: `DATASET_PATH`
is bridged to `DROID_ROOT`, `BASE_CHECKPOINT_PATH` and `WAN_VAE_PATH` are exported for the TOML,
Expand All @@ -63,6 +69,31 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo
bash launch_sft_action_policy_droid.sh
```

## LIBERO-10 quick start

The LIBERO launcher stages the `libero_10` suite (auto-downloaded if missing),
downloads the Wan VAE, converts the base checkpoint, and trains.

```shell
bash launch_sft_action_policy_libero.sh
```

The launcher:

- downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing
- downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed
- launches training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`)

Relocate inputs via env vars, or run a short smoke test:

```shell
export LIBERO_ROOT=/scratch/LIBERO_LeRobot_v3/libero_10
export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpoint.save_iter=10 dataloader_train.max_samples_per_batch=32"
bash launch_sft_action_policy_libero.sh
```

Checkpoints are saved every 500 iters.

## Outputs

Training writes to `outputs/train/<project>/<group>/<name>/`:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env bash
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: OpenMDW-1.1

# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (HSDP 2x8).
# Run from this folder with the cosmos-framework venv active (see README):
# bash launch_sft_action_policy_libero.sh
# It prepares the small dependencies, checks for the staged libero_10 dataset, and trains.
# Paths are fixed under this (git-ignored) folder, matching the reasoner finetune
# wrappers, while the TOML and tail-overrides match the cosmos-framework example.

set -euo pipefail
cd "$(dirname "${BASH_SOURCE[0]}")"

TOML_FILE="toml/sft_config/action_policy_libero_repro.toml"
: "${LIBERO_ROOT:=$PWD/data/LIBERO_LeRobot_v3/libero_10}"
: "${BASE_CHECKPOINT_PATH:=$PWD/checkpoints/Cosmos3-Nano}"
: "${WAN_VAE_PATH:=$PWD/checkpoints/wan22_vae/Wan2.2_VAE.pth}"

# 1. Stage the libero_10 suite (the Table-20 reproduction trains on libero_10 ALONE).
if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
echo "Downloading nvidia/LIBERO_LeRobot_v3 (libero_10) ..."
uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \
--include 'libero_10/**' --local-dir "$(dirname "$LIBERO_ROOT")"
fi
if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
cat >&2 <<EOF
ERROR: missing libero_10 dataset at:
$LIBERO_ROOT

Expected a LeRobotDataset dir containing meta/info.json. Stage it with:
uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \\
--include 'libero_10/**' --local-dir data/LIBERO_LeRobot_v3
or export LIBERO_ROOT=/path/to/libero_10.
EOF
exit 1
fi

# 2. Download the Wan2.2 VAE (skipped if present).
if [[ ! -f "$WAN_VAE_PATH" ]]; then
uvx hf@latest download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth --local-dir "$(dirname "$WAN_VAE_PATH")"
fi

# 3. Convert the base checkpoint to DCP (skipped if present).
if [[ ! -d "$BASE_CHECKPOINT_PATH" ]]; then
python -m cosmos_framework.scripts.convert_model_to_dcp -o "$BASE_CHECKPOINT_PATH" --checkpoint-path Cosmos3-Nano
fi

# 4. Train (HSDP 2x8 per the TOML; set NNODES/NODE_RANK/MASTER_ADDR per node).
# The TOML reads these paths from the environment.
export LIBERO_ROOT
export BASE_CHECKPOINT_PATH
export WAN_VAE_PATH

TAIL_OVERRIDES=()
if [[ -n "${EXTRA_TAIL_OVERRIDES:-}" ]]; then
# EXTRA_TAIL_OVERRIDES is intentionally word-split to match the framework launcher UX.
# shellcheck disable=SC2206
TAIL_OVERRIDES=(${EXTRA_TAIL_OVERRIDES})
fi

TORCHRUN_ARGS=(--nproc_per_node="${NPROC_PER_NODE:-8}")
TORCHRUN_ARGS+=(--master_port="${MASTER_PORT:-50012}")
[[ -n "${NNODES:-}" ]] && TORCHRUN_ARGS+=(--nnodes="$NNODES")
[[ -n "${NODE_RANK:-}" ]] && TORCHRUN_ARGS+=(--node_rank="$NODE_RANK")
[[ -n "${MASTER_ADDR:-}" ]] && TORCHRUN_ARGS+=(--master_addr="$MASTER_ADDR")

OUTPUT_ROOT="${OUTPUT_ROOT:-$PWD/outputs/train}"
if (( ${#TAIL_OVERRIDES[@]} )); then
IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
-m cosmos_framework.scripts.train --sft-toml="$TOML_FILE" \
-- "${TAIL_OVERRIDES[@]}"
else
IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
-m cosmos_framework.scripts.train --sft-toml="$TOML_FILE"
fi
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: OpenMDW-1.1

# LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano`
# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048).
# Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT.
# See docs/action_policy_libero_sft.md.

[job]
task = "vfm"
experiment = "action_policy_libero_nano"
project = "cosmos3_action_libero"
group = "action_sft"
name = "action_policy_libero_repro"
wandb_mode = "online"

[model]
precision = "bfloat16"
max_num_tokens_after_packing = 74000

[model.parallelism]
data_parallel_shard_degree = 8
data_parallel_replicate_degree = 2 # HSDP 2x8 = 16 ranks (2 nodes); minimum for gbs 2048 at grad_accum 1

[model.activation_checkpointing]
mode = "selective"
save_ops_regex = ["fmha"]

[model.tokenizer]
vae_path = "${oc.env:WAN_VAE_PATH}"

[optimizer]
lr = 5.0e-05

[scheduler]
cycle_lengths = [16000]
warm_up_steps = [500]

[trainer]
max_iter = 2000
logging_iter = 50
grad_accum_iter = 1 # global batch = max_samples 128 x (shard 8 x replicate 2) x 1 = 2048

[checkpoint]
load_path = "${oc.env:BASE_CHECKPOINT_PATH}"
save_iter = 500