NVIDIA · fwd4 · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026
diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md
@@ -88,7 +88,6 @@ visualize the generated videos:
   inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano.
 - [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID.
 
-
 ## Run with vLLM-Omni
 
 ### Quickstart
@@ -135,7 +134,9 @@ To reproduce our post-training recipe for [Cosmos3-Nano-Policy-DROID](https://hu
 launch-script pattern as the other Cosmos3 finetune cookbooks while delegating
 the canonical training implementation to Cosmos Framework.
 
-
+The same [action-policy SFT cookbook](./finetune/README.md) also covers **LIBERO-10**
+(`launch_sft_action_policy_libero.sh`) — fine-tuning Cosmos3-Nano on the `libero_10`
+simulation benchmark with the same launch-script pattern.
 
 ## TODO
 

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -1,12 +1,18 @@
-# Cosmos3-Nano-Policy-DROID Fine-Tuning (SFT)
+# Cosmos3-Nano Action-Policy Fine-Tuning (SFT)
 
-This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into an action policy for the DROID robot. It reproduces the post-training recipe used to create [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID), leveraging the public [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) dataset and the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework).
+This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result:
+
+- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID): trained on real-robot DROID data, evaluated on the RoboLab simulation benchmark.
+- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 results: trained and evaluated on the LIBERO-10 simulation benchmark.
 
 | Recipe | Launch shell | Base model | Dataset |
 | --- | --- | --- | --- |
 | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split |
+| Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` |
+
+The DROID recipe uses the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
 
-The recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment. It trains a DROID policy model with `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**.
 
 ## Prerequisites
 
@@ -38,7 +44,7 @@ The launcher is a complete local wrapper for the public cookbook:
 - downloads `Wan2.2_VAE.pth` if needed
 - converts `Cosmos3-Nano` to a local DCP checkpoint if needed
 - downloads `keep_ranges_1_0_1.json` if needed
-- launches 8-GPU training with `action_policy_droid_repro.toml`
+- launches training with `action_policy_droid_repro.toml`
 
 The script intentionally stays close to the `cosmos-framework` example launcher: `DATASET_PATH`
 is bridged to `DROID_ROOT`, `BASE_CHECKPOINT_PATH` and `WAN_VAE_PATH` are exported for the TOML,
@@ -63,6 +69,31 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo
 bash launch_sft_action_policy_droid.sh
 ```
 
+## LIBERO-10 quick start
+
+The LIBERO launcher stages the `libero_10` suite (auto-downloaded if missing),
+downloads the Wan VAE, converts the base checkpoint, and trains.
+
+```shell
+bash launch_sft_action_policy_libero.sh
+```
+
+The launcher:
+
+- downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing
+- downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed
+- launches training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`)
+
+Relocate inputs via env vars, or run a short smoke test:
+
+```shell
+export LIBERO_ROOT=/scratch/LIBERO_LeRobot_v3/libero_10
+export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpoint.save_iter=10 dataloader_train.max_samples_per_batch=32"
+bash launch_sft_action_policy_libero.sh
+```
+
+Checkpoints are saved every 500 iters.
+
 ## Outputs
 
 Training writes to `outputs/train/<project>/<group>/<name>/`:

diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
@@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: OpenMDW-1.1
+
+# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (HSDP 2x8).
+# Run from this folder with the cosmos-framework venv active (see README):
+#   bash launch_sft_action_policy_libero.sh
+# It prepares the small dependencies, checks for the staged libero_10 dataset, and trains.
+# Paths are fixed under this (git-ignored) folder, matching the reasoner finetune
+# wrappers, while the TOML and tail-overrides match the cosmos-framework example.
+
+set -euo pipefail
+cd "$(dirname "${BASH_SOURCE[0]}")"
+
+TOML_FILE="toml/sft_config/action_policy_libero_repro.toml"
+: "${LIBERO_ROOT:=$PWD/data/LIBERO_LeRobot_v3/libero_10}"
+: "${BASE_CHECKPOINT_PATH:=$PWD/checkpoints/Cosmos3-Nano}"
+: "${WAN_VAE_PATH:=$PWD/checkpoints/wan22_vae/Wan2.2_VAE.pth}"
+
+# 1. Stage the libero_10 suite (the Table-20 reproduction trains on libero_10 ALONE).
+if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
+    echo "Downloading nvidia/LIBERO_LeRobot_v3 (libero_10) ..."
+    uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \
+        --include 'libero_10/**' --local-dir "$(dirname "$LIBERO_ROOT")"
+fi
+if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
+    cat >&2 <<EOF
+ERROR: missing libero_10 dataset at:
+  $LIBERO_ROOT
+
+Expected a LeRobotDataset dir containing meta/info.json. Stage it with:
+  uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \\
+      --include 'libero_10/**' --local-dir data/LIBERO_LeRobot_v3
+or export LIBERO_ROOT=/path/to/libero_10.
+EOF
+    exit 1
+fi
+
+# 2. Download the Wan2.2 VAE (skipped if present).
+if [[ ! -f "$WAN_VAE_PATH" ]]; then
+    uvx hf@latest download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth --local-dir "$(dirname "$WAN_VAE_PATH")"
+fi
+
+# 3. Convert the base checkpoint to DCP (skipped if present).
+if [[ ! -d "$BASE_CHECKPOINT_PATH" ]]; then
+    python -m cosmos_framework.scripts.convert_model_to_dcp -o "$BASE_CHECKPOINT_PATH" --checkpoint-path Cosmos3-Nano
+fi
+
+# 4. Train (HSDP 2x8 per the TOML; set NNODES/NODE_RANK/MASTER_ADDR per node).
+#    The TOML reads these paths from the environment.
+export LIBERO_ROOT
+export BASE_CHECKPOINT_PATH
+export WAN_VAE_PATH
+
+TAIL_OVERRIDES=()
+if [[ -n "${EXTRA_TAIL_OVERRIDES:-}" ]]; then
+    # EXTRA_TAIL_OVERRIDES is intentionally word-split to match the framework launcher UX.
+    # shellcheck disable=SC2206
+    TAIL_OVERRIDES=(${EXTRA_TAIL_OVERRIDES})
+fi
+
+TORCHRUN_ARGS=(--nproc_per_node="${NPROC_PER_NODE:-8}")
+TORCHRUN_ARGS+=(--master_port="${MASTER_PORT:-50012}")
+[[ -n "${NNODES:-}" ]] && TORCHRUN_ARGS+=(--nnodes="$NNODES")
+[[ -n "${NODE_RANK:-}" ]] && TORCHRUN_ARGS+=(--node_rank="$NODE_RANK")
+[[ -n "${MASTER_ADDR:-}" ]] && TORCHRUN_ARGS+=(--master_addr="$MASTER_ADDR")
+
+OUTPUT_ROOT="${OUTPUT_ROOT:-$PWD/outputs/train}"
+if (( ${#TAIL_OVERRIDES[@]} )); then
+    IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
+        -m cosmos_framework.scripts.train --sft-toml="$TOML_FILE" \
+        -- "${TAIL_OVERRIDES[@]}"
+else
+    IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
+        -m cosmos_framework.scripts.train --sft-toml="$TOML_FILE"
+fi
diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
@@ -0,0 +1,46 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: OpenMDW-1.1
+
+# LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano`
+# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048).
+# Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT.
+# See docs/action_policy_libero_sft.md.
+
+[job]
+task         = "vfm"
+experiment   = "action_policy_libero_nano"
+project      = "cosmos3_action_libero"
+group        = "action_sft"
+name         = "action_policy_libero_repro"
+wandb_mode   = "online"
+
+[model]
+precision = "bfloat16"
+max_num_tokens_after_packing = 74000
+
+[model.parallelism]
+data_parallel_shard_degree     = 8
+data_parallel_replicate_degree = 2    # HSDP 2x8 = 16 ranks (2 nodes); minimum for gbs 2048 at grad_accum 1
+
+[model.activation_checkpointing]
+mode           = "selective"
+save_ops_regex = ["fmha"]
+
+[model.tokenizer]
+vae_path = "${oc.env:WAN_VAE_PATH}"
+
+[optimizer]
+lr = 5.0e-05
+
+[scheduler]
+cycle_lengths = [16000]
+warm_up_steps = [500]
+
+[trainer]
+max_iter        = 2000
+logging_iter    = 50
+grad_accum_iter = 1       # global batch = max_samples 128 x (shard 8 x replicate 2) x 1 = 2048
+
+[checkpoint]
+load_path = "${oc.env:BASE_CHECKPOINT_PATH}"
+save_iter = 500