From dc9877f03621e35f4305aaefe4dfda373b196cb5 Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:05:29 +0800
Subject: [PATCH 1/7] Add LIBERO-10 action-policy finetune cookbook

Mirror the DROID action-policy cookbook for LIBERO-10: launch_sft_action_policy_libero.sh
+ action_policy_libero_repro.toml + finetune README section + action README link.
Stages nvidia/LIBERO_LeRobot_v3 libero_10, no keep-ranges filter, lr 5e-5/wu500/cyc16k.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 cookbooks/cosmos3/generator/action/README.md  |  5 +-
 .../generator/action/finetune/README.md       | 34 +++++++-
 .../launch_sft_action_policy_libero.sh        | 75 ++++++++++++++++++
 .../action_policy_libero_repro.toml           | 79 +++++++++++++++++++
 4 files changed, 188 insertions(+), 5 deletions(-)
 create mode 100755 cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
 create mode 100644 cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml

diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md
index 73760e9d..49169f55 100644
--- a/cookbooks/cosmos3/generator/action/README.md
+++ b/cookbooks/cosmos3/generator/action/README.md
@@ -88,7 +88,6 @@ visualize the generated videos:
   inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano.
 - [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID.
 
-
 ## Run with vLLM-Omni
 
 ### Quickstart
@@ -135,7 +134,9 @@ To reproduce our post-training recipe for [Cosmos3-Nano-Policy-DROID](https://hu
 launch-script pattern as the other Cosmos3 finetune cookbooks while delegating
 the canonical training implementation to Cosmos Framework.
 
-
+The same [action-policy SFT cookbook](./finetune/README.md) also covers **LIBERO-10**
+(`launch_sft_action_policy_libero.sh`) — fine-tuning Cosmos3-Nano on the `libero_10`
+simulation benchmark with the same launch-script pattern.
 
 ## TODO
 
diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index 52436be6..d5661e9a 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -1,12 +1,15 @@
-# Cosmos3-Nano-Policy-DROID Fine-Tuning (SFT)
+# Cosmos3-Nano Action-Policy Fine-Tuning (SFT)
 
-This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into an action policy for the DROID robot. It reproduces the post-training recipe used to create [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID), leveraging the public [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) dataset and the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework).
+This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered: **DROID** (reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID)) and **LIBERO-10** (the simulation benchmark).
 
 | Recipe | Launch shell | Base model | Dataset |
 | --- | --- | --- | --- |
 | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split |
+| Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` |
 
-The recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment. It trains a DROID policy model with `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
+The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
+
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the Table-20 reproduction; the 4-suite mix dilutes libero_10). No keep-ranges filter. Reaches ~95% success on the 500-episode libero_10 closed-loop eval (best ~95.2% @ iter_1500).
 
 ## Prerequisites
 
@@ -63,6 +66,31 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo
 bash launch_sft_action_policy_droid.sh
 ```
 
+## LIBERO-10 quick start
+
+The LIBERO launcher mirrors the DROID one. It stages the `libero_10` suite (auto-downloaded if
+missing), downloads the Wan VAE, converts the base checkpoint, and trains — no keep-ranges filter.
+
+```shell
+bash launch_sft_action_policy_libero.sh
+```
+
+The launcher:
+
+- downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing
+- downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed
+- launches 8-GPU training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`)
+
+Relocate inputs via env vars, or run a short smoke test:
+
+```shell
+export LIBERO_ROOT=/scratch/LIBERO_LeRobot_v3/libero_10
+export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpoint.save_iter=10 dataloader_train.max_samples_per_batch=32"
+bash launch_sft_action_policy_libero.sh
+```
+
+Checkpoints are saved every 500 iters (sweep 500/1000/1500/2000); the peak is typically iter_1500.
+
 ## Outputs
 
 Training writes to `outputs/train/<project>/<group>/<name>/`:
diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
new file mode 100755
index 00000000..128540cb
--- /dev/null
+++ b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
@@ -0,0 +1,75 @@
+#!/usr/bin/env bash
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: OpenMDW-1.1
+
+# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (8x H100).
+# Run from this folder with the cosmos-framework venv active (see README):
+#   bash launch_sft_action_policy_libero.sh
+# It prepares the small dependencies, checks for the staged libero_10 dataset, and trains.
+# Paths are fixed under this (git-ignored) folder, matching the reasoner finetune
+# wrappers, while the TOML and tail-overrides match the cosmos-framework example.
+
+set -euo pipefail
+cd "$(dirname "${BASH_SOURCE[0]}")"
+
+TOML_FILE="toml/sft_config/action_policy_libero_repro.toml"
+: "${LIBERO_ROOT:=$PWD/data/LIBERO_LeRobot_v3/libero_10}"
+: "${BASE_CHECKPOINT_PATH:=$PWD/checkpoints/Cosmos3-Nano}"
+: "${WAN_VAE_PATH:=$PWD/checkpoints/wan22_vae/Wan2.2_VAE.pth}"
+
+# 1. Stage the libero_10 suite (the Table-20 reproduction trains on libero_10 ALONE).
+if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
+    echo "Downloading nvidia/LIBERO_LeRobot_v3 (libero_10) ..."
+    uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \
+        --include 'libero_10/**' --local-dir "$(dirname "$LIBERO_ROOT")"
+fi
+if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then
+    cat >&2 <<EOF
+ERROR: missing libero_10 dataset at:
+  $LIBERO_ROOT
+
+Expected a LeRobotDataset dir containing meta/info.json. Stage it with:
+  uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \\
+      --include 'libero_10/**' --local-dir data/LIBERO_LeRobot_v3
+or export LIBERO_ROOT=/path/to/libero_10.
+EOF
+    exit 1
+fi
+
+# 2. Download the Wan2.2 VAE (skipped if present).
+if [[ ! -f "$WAN_VAE_PATH" ]]; then
+    uvx hf@latest download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth --local-dir "$(dirname "$WAN_VAE_PATH")"
+fi
+
+# 3. Convert the base checkpoint to DCP (skipped if present).
+if [[ ! -d "$BASE_CHECKPOINT_PATH" ]]; then
+    python -m cosmos_framework.scripts.convert_model_to_dcp -o "$BASE_CHECKPOINT_PATH" --checkpoint-path Cosmos3-Nano
+fi
+
+# 4. Train (8-GPU FSDP by default). The TOML reads these paths from the environment.
+export LIBERO_ROOT
+export BASE_CHECKPOINT_PATH
+export WAN_VAE_PATH
+
+TAIL_OVERRIDES=()
+if [[ -n "${EXTRA_TAIL_OVERRIDES:-}" ]]; then
+    # EXTRA_TAIL_OVERRIDES is intentionally word-split to match the framework launcher UX.
+    # shellcheck disable=SC2206
+    TAIL_OVERRIDES=(${EXTRA_TAIL_OVERRIDES})
+fi
+
+TORCHRUN_ARGS=(--nproc_per_node="${NPROC_PER_NODE:-8}")
+TORCHRUN_ARGS+=(--master_port="${MASTER_PORT:-50012}")
+[[ -n "${NNODES:-}" ]] && TORCHRUN_ARGS+=(--nnodes="$NNODES")
+[[ -n "${NODE_RANK:-}" ]] && TORCHRUN_ARGS+=(--node_rank="$NODE_RANK")
+[[ -n "${MASTER_ADDR:-}" ]] && TORCHRUN_ARGS+=(--master_addr="$MASTER_ADDR")
+
+OUTPUT_ROOT="${OUTPUT_ROOT:-$PWD/outputs/train}"
+if (( ${#TAIL_OVERRIDES[@]} )); then
+    IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
+        -m cosmos_framework.scripts.train --sft-toml="$TOML_FILE" \
+        -- "${TAIL_OVERRIDES[@]}"
+else
+    IMAGINAIRE_OUTPUT_ROOT="${IMAGINAIRE_OUTPUT_ROOT:-$OUTPUT_ROOT}" torchrun "${TORCHRUN_ARGS[@]}" \
+        -m cosmos_framework.scripts.train --sft-toml="$TOML_FILE"
+fi
diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
new file mode 100644
index 00000000..7fd788d3
--- /dev/null
+++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
@@ -0,0 +1,79 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: OpenMDW-1.1
+
+# ============================================================================
+# LIBERO action-policy SFT — run config for the `action_policy_libero_nano`
+# experiment (Cosmos3-Nano LIBERO-10). The recipe knobs (optimizer base, count-
+# based batch, action-head skip-on-load, dataset knobs) live in the registered
+# experiment; this file sets run-level scalars (lr/schedule, iters, ckpt cadence,
+# parallelism shape, wandb, VAE path).
+#
+# RECIPE (recommended): lr 5e-5, warmup 500, cycle 16000 (so LR is barely decayed
+# at iter 2000, ~4.5e-5), global batch 2048, save every 500 -> sweep 500..2000.
+# Best observed: ~95.2% @ iter_1500 (libero_10, 500-ep closed-loop eval), with
+# task-0 success stable across the sweep (no over-fit collapse). This gentle-LR
+# schedule is more robust than a higher lr (e.g. 1e-4), which peaks near iter_1000
+# then over-fits task 0 and regresses. See docs/action_policy_libero_sft.md.
+#
+# REPRODUCTION: train on libero_10 ALONE (point LIBERO_ROOT at the libero_10
+# LeRobot conversion only). The 4-suite mix dilutes libero_10 (~1/4 the exposure
+# per step) and converges more slowly.
+#
+# Env required:
+#   LIBERO_ROOT=/path/to/libero_10_lerobot
+#   BASE_CHECKPOINT_PATH=<Cosmos3-Nano DCP dir>
+#   WAN_VAE_PATH=<Wan2.2_VAE.pth>
+#   IMAGINAIRE_OUTPUT_ROOT=/path/to/output_root   # persist checkpoints
+# ============================================================================
+
+[job]
+task         = "vfm"
+experiment   = "action_policy_libero_nano"
+project      = "cosmos3_action_libero"
+group        = "action_sft"
+name         = "action_policy_libero_repro"
+wandb_mode   = "online"
+
+[model]
+precision = "bfloat16"
+# Cap the packed sequence (GA-validated). Uncapped (-1) packs one very long sequence
+# and OOMs even on H200.
+max_num_tokens_after_packing = 74000
+
+[model.parallelism]
+data_parallel_shard_degree     = 8    # 1-node 8-GPU shard; raise replicate for multi-node HSDP
+data_parallel_replicate_degree = 1
+
+[model.activation_checkpointing]
+mode           = "selective"          # GA recipe (full is slower; selective fits 256x512)
+save_ops_regex = ["fmha"]
+
+[model.tokenizer]
+vae_path = "${oc.env:WAN_VAE_PATH}"
+
+[optimizer]
+lr = 5.0e-05              # recommended base lr
+
+[scheduler]
+cycle_lengths = [16000]   # LR trajectory: warmup 500 -> linear decay over 16k (barely decayed at 2k)
+warm_up_steps = [500]
+
+[trainer]
+max_iter        = 2000    # pause at 2k; sweep checkpoints 500/1000/1500/2000 for the peak
+logging_iter    = 50
+grad_accum_iter = 2       # global batch = max_samples_per_batch 128 x DP 8 x grad_accum 2 = 2048
+
+[checkpoint]
+load_path = "${oc.env:BASE_CHECKPOINT_PATH}"
+save_iter = 500           # sweep cadence; peak is typically iter_1500
+
+# NOTE (train/serve parity — see GitHub issue NVIDIA/cosmos-framework#50): the
+# 256x512 concat_view is snapped to a 192x320 model canvas (resize+reflect-pad), and
+# the eval server reproduces the same snap. Run the client with the same 2:1 concat
+# (--camera agentview,wrist --image_size 256) so resolution + prompt suffix match, and
+# use --action-normalization quantile_rot + the bundled libero rot6d stats on the
+# server so denormalization matches training. See docs/action_policy_libero_sft.md.
+#
+# max_samples_per_batch is 128 in the experiment (256 OOMs: per-forward peak, not grad_accum).
+# On lower-memory GPUs reduce at launch:
+#   --opts dataloader_train.max_samples_per_batch=64

From c5d493c11b0689a2a130562ca5d973f136859b5e Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:24:38 +0800
Subject: [PATCH 2/7] libero cookbook: trim README detail, drop SR numbers,
 sync recipe toml

---
 .../generator/action/finetune/README.md       |  4 +-
 .../action_policy_libero_repro.toml           | 57 ++++---------------
 2 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index d5661e9a..f2514eb4 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -9,7 +9,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/
 
 The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
 
-The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the Table-20 reproduction; the 4-suite mix dilutes libero_10). No keep-ranges filter. Reaches ~95% success on the 500-episode libero_10 closed-loop eval (best ~95.2% @ iter_1500).
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the 4-suite mix dilutes libero_10). No keep-ranges filter.
 
 ## Prerequisites
 
@@ -89,7 +89,7 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo
 bash launch_sft_action_policy_libero.sh
 ```
 
-Checkpoints are saved every 500 iters (sweep 500/1000/1500/2000); the peak is typically iter_1500.
+Checkpoints are saved every 500 iters; sweep them to pick the best iteration.
 
 ## Outputs
 
diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
index 7fd788d3..d74237bc 100644
--- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
+++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
@@ -1,30 +1,10 @@
 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: OpenMDW-1.1
 
-# ============================================================================
-# LIBERO action-policy SFT — run config for the `action_policy_libero_nano`
-# experiment (Cosmos3-Nano LIBERO-10). The recipe knobs (optimizer base, count-
-# based batch, action-head skip-on-load, dataset knobs) live in the registered
-# experiment; this file sets run-level scalars (lr/schedule, iters, ckpt cadence,
-# parallelism shape, wandb, VAE path).
-#
-# RECIPE (recommended): lr 5e-5, warmup 500, cycle 16000 (so LR is barely decayed
-# at iter 2000, ~4.5e-5), global batch 2048, save every 500 -> sweep 500..2000.
-# Best observed: ~95.2% @ iter_1500 (libero_10, 500-ep closed-loop eval), with
-# task-0 success stable across the sweep (no over-fit collapse). This gentle-LR
-# schedule is more robust than a higher lr (e.g. 1e-4), which peaks near iter_1000
-# then over-fits task 0 and regresses. See docs/action_policy_libero_sft.md.
-#
-# REPRODUCTION: train on libero_10 ALONE (point LIBERO_ROOT at the libero_10
-# LeRobot conversion only). The 4-suite mix dilutes libero_10 (~1/4 the exposure
-# per step) and converges more slowly.
-#
-# Env required:
-#   LIBERO_ROOT=/path/to/libero_10_lerobot
-#   BASE_CHECKPOINT_PATH=<Cosmos3-Nano DCP dir>
-#   WAN_VAE_PATH=<Wan2.2_VAE.pth>
-#   IMAGINAIRE_OUTPUT_ROOT=/path/to/output_root   # persist checkpoints
-# ============================================================================
+# LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano`
+# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048).
+# Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT.
+# See docs/action_policy_libero_sft.md.
 
 [job]
 task         = "vfm"
@@ -36,44 +16,31 @@ wandb_mode   = "online"
 
 [model]
 precision = "bfloat16"
-# Cap the packed sequence (GA-validated). Uncapped (-1) packs one very long sequence
-# and OOMs even on H200.
 max_num_tokens_after_packing = 74000
 
 [model.parallelism]
-data_parallel_shard_degree     = 8    # 1-node 8-GPU shard; raise replicate for multi-node HSDP
-data_parallel_replicate_degree = 1
+data_parallel_shard_degree     = 8
+data_parallel_replicate_degree = 2    # HSDP 2x8 = 16 ranks (2 nodes)
 
 [model.activation_checkpointing]
-mode           = "selective"          # GA recipe (full is slower; selective fits 256x512)
+mode           = "selective"
 save_ops_regex = ["fmha"]
 
 [model.tokenizer]
 vae_path = "${oc.env:WAN_VAE_PATH}"
 
 [optimizer]
-lr = 5.0e-05              # recommended base lr
+lr = 5.0e-05
 
 [scheduler]
-cycle_lengths = [16000]   # LR trajectory: warmup 500 -> linear decay over 16k (barely decayed at 2k)
+cycle_lengths = [16000]
 warm_up_steps = [500]
 
 [trainer]
-max_iter        = 2000    # pause at 2k; sweep checkpoints 500/1000/1500/2000 for the peak
+max_iter        = 2000
 logging_iter    = 50
-grad_accum_iter = 2       # global batch = max_samples_per_batch 128 x DP 8 x grad_accum 2 = 2048
+grad_accum_iter = 1       # global batch = 128 x (8 x 2) x 1 = 2048
 
 [checkpoint]
 load_path = "${oc.env:BASE_CHECKPOINT_PATH}"
-save_iter = 500           # sweep cadence; peak is typically iter_1500
-
-# NOTE (train/serve parity — see GitHub issue NVIDIA/cosmos-framework#50): the
-# 256x512 concat_view is snapped to a 192x320 model canvas (resize+reflect-pad), and
-# the eval server reproduces the same snap. Run the client with the same 2:1 concat
-# (--camera agentview,wrist --image_size 256) so resolution + prompt suffix match, and
-# use --action-normalization quantile_rot + the bundled libero rot6d stats on the
-# server so denormalization matches training. See docs/action_policy_libero_sft.md.
-#
-# max_samples_per_batch is 128 in the experiment (256 OOMs: per-forward peak, not grad_accum).
-# On lower-memory GPUs reduce at launch:
-#   --opts dataloader_train.max_samples_per_batch=64
+save_iter = 500

From 53e4af56d50db9df6caab518f4e5a7e335219003 Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:43:46 +0800
Subject: [PATCH 3/7] libero cookbook: drop droid comparison in libero
 quick-start

---
 cookbooks/cosmos3/generator/action/finetune/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index f2514eb4..73f811d8 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -68,8 +68,8 @@ bash launch_sft_action_policy_droid.sh
 
 ## LIBERO-10 quick start
 
-The LIBERO launcher mirrors the DROID one. It stages the `libero_10` suite (auto-downloaded if
-missing), downloads the Wan VAE, converts the base checkpoint, and trains — no keep-ranges filter.
+The LIBERO launcher stages the `libero_10` suite (auto-downloaded if missing),
+downloads the Wan VAE, converts the base checkpoint, and trains.
 
 ```shell
 bash launch_sft_action_policy_libero.sh

From 93c36d67272cec063c56b129a6e10f2031c48e19 Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:49:34 +0800
Subject: [PATCH 4/7] libero cookbook: sync recipe toml (HSDP 8x8)

---
 .../toml/sft_config/action_policy_libero_repro.toml         | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
index d74237bc..63ab0323 100644
--- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
+++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: OpenMDW-1.1
 
 # LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano`
-# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048).
+# experiment. Train on libero_10 alone (HSDP 8x8, global batch 2048).
 # Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT.
 # See docs/action_policy_libero_sft.md.
 
@@ -20,7 +20,7 @@ max_num_tokens_after_packing = 74000
 
 [model.parallelism]
 data_parallel_shard_degree     = 8
-data_parallel_replicate_degree = 2    # HSDP 2x8 = 16 ranks (2 nodes)
+data_parallel_replicate_degree = 8    # HSDP 8x8 = 64 ranks (8 nodes)
 
 [model.activation_checkpointing]
 mode           = "selective"
@@ -39,7 +39,7 @@ warm_up_steps = [500]
 [trainer]
 max_iter        = 2000
 logging_iter    = 50
-grad_accum_iter = 1       # global batch = 128 x (8 x 2) x 1 = 2048
+grad_accum_iter = 1       # global batch = max_samples 32 x (shard 8 x replicate 8) x 1 = 2048
 
 [checkpoint]
 load_path = "${oc.env:BASE_CHECKPOINT_PATH}"

From d13dbdc902f20af51918818451fc3be0a93cf9d0 Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:53:00 +0800
Subject: [PATCH 5/7] libero cookbook: pair DROID/LIBERO intros (both reproduce
 Cosmos3 paper results); recipe HSDP 2x8

---
 cookbooks/cosmos3/generator/action/finetune/README.md      | 7 +++++--
 .../toml/sft_config/action_policy_libero_repro.toml        | 6 +++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index 73f811d8..97737f2e 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -1,6 +1,9 @@
 # Cosmos3-Nano Action-Policy Fine-Tuning (SFT)
 
-This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered: **DROID** (reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID)) and **LIBERO-10** (the simulation benchmark).
+This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result:
+
+- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID) on real-robot manipulation data.
+- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 simulation-benchmark results.
 
 | Recipe | Launch shell | Base model | Dataset |
 | --- | --- | --- | --- |
@@ -9,7 +12,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/
 
 The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
 
-The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the 4-suite mix dilutes libero_10). No keep-ranges filter.
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone** (the all-suites mix dilutes libero_10). No keep-ranges filter.
 
 ## Prerequisites
 
diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
index 63ab0323..a0c49c7a 100644
--- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
+++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: OpenMDW-1.1
 
 # LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano`
-# experiment. Train on libero_10 alone (HSDP 8x8, global batch 2048).
+# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048).
 # Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT.
 # See docs/action_policy_libero_sft.md.
 
@@ -20,7 +20,7 @@ max_num_tokens_after_packing = 74000
 
 [model.parallelism]
 data_parallel_shard_degree     = 8
-data_parallel_replicate_degree = 8    # HSDP 8x8 = 64 ranks (8 nodes)
+data_parallel_replicate_degree = 2    # HSDP 2x8 = 16 ranks (2 nodes); minimum for gbs 2048 at grad_accum 1
 
 [model.activation_checkpointing]
 mode           = "selective"
@@ -39,7 +39,7 @@ warm_up_steps = [500]
 [trainer]
 max_iter        = 2000
 logging_iter    = 50
-grad_accum_iter = 1       # global batch = max_samples 32 x (shard 8 x replicate 8) x 1 = 2048
+grad_accum_iter = 1       # global batch = max_samples 128 x (shard 8 x replicate 2) x 1 = 2048
 
 [checkpoint]
 load_path = "${oc.env:BASE_CHECKPOINT_PATH}"

From d0924babaedceff050b42825eac36ad19a2c53da Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 21:58:19 +0800
Subject: [PATCH 6/7] libero cookbook: DROID trained-real/eval-RoboLab-sim;
 drop all-suites mention

---
 cookbooks/cosmos3/generator/action/finetune/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index 97737f2e..bfb275de 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -2,8 +2,8 @@
 
 This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result:
 
-- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID) on real-robot manipulation data.
-- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 simulation-benchmark results.
+- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID): trained on real-robot DROID data, evaluated on the RoboLab simulation benchmark.
+- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 results: trained and evaluated on the LIBERO-10 simulation benchmark.
 
 | Recipe | Launch shell | Base model | Dataset |
 | --- | --- | --- | --- |
@@ -12,7 +12,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/
 
 The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
 
-The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone** (the all-suites mix dilutes libero_10). No keep-ranges filter.
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. No keep-ranges filter.
 
 ## Prerequisites
 

From dd584704c820026b434d2928a202f2800aadfd07 Mon Sep 17 00:00:00 2001
From: Liang Hao <hliangac@connect.ust.hk>
Date: Fri, 26 Jun 2026 22:09:43 +0800
Subject: [PATCH 7/7] libero cookbook: drop [job].task=vfm, '8-GPU', sweep +
 no-filter mentions

---
 cookbooks/cosmos3/generator/action/finetune/README.md  | 10 +++++-----
 .../action/finetune/launch_sft_action_policy_libero.sh |  5 +++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md
index bfb275de..4f5f2bd0 100644
--- a/cookbooks/cosmos3/generator/action/finetune/README.md
+++ b/cookbooks/cosmos3/generator/action/finetune/README.md
@@ -10,9 +10,9 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/
 | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split |
 | Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` |
 
-The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
+The DROID recipe uses the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter.
 
-The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. No keep-ranges filter.
+The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**.
 
 ## Prerequisites
 
@@ -44,7 +44,7 @@ The launcher is a complete local wrapper for the public cookbook:
 - downloads `Wan2.2_VAE.pth` if needed
 - converts `Cosmos3-Nano` to a local DCP checkpoint if needed
 - downloads `keep_ranges_1_0_1.json` if needed
-- launches 8-GPU training with `action_policy_droid_repro.toml`
+- launches training with `action_policy_droid_repro.toml`
 
 The script intentionally stays close to the `cosmos-framework` example launcher: `DATASET_PATH`
 is bridged to `DROID_ROOT`, `BASE_CHECKPOINT_PATH` and `WAN_VAE_PATH` are exported for the TOML,
@@ -82,7 +82,7 @@ The launcher:
 
 - downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing
 - downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed
-- launches 8-GPU training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`)
+- launches training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`)
 
 Relocate inputs via env vars, or run a short smoke test:
 
@@ -92,7 +92,7 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo
 bash launch_sft_action_policy_libero.sh
 ```
 
-Checkpoints are saved every 500 iters; sweep them to pick the best iteration.
+Checkpoints are saved every 500 iters.
 
 ## Outputs
 
diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
index 128540cb..48f886e8 100755
--- a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
+++ b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh
@@ -2,7 +2,7 @@
 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: OpenMDW-1.1
 
-# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (8x H100).
+# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (HSDP 2x8).
 # Run from this folder with the cosmos-framework venv active (see README):
 #   bash launch_sft_action_policy_libero.sh
 # It prepares the small dependencies, checks for the staged libero_10 dataset, and trains.
@@ -46,7 +46,8 @@ if [[ ! -d "$BASE_CHECKPOINT_PATH" ]]; then
     python -m cosmos_framework.scripts.convert_model_to_dcp -o "$BASE_CHECKPOINT_PATH" --checkpoint-path Cosmos3-Nano
 fi
 
-# 4. Train (8-GPU FSDP by default). The TOML reads these paths from the environment.
+# 4. Train (HSDP 2x8 per the TOML; set NNODES/NODE_RANK/MASTER_ADDR per node).
+#    The TOML reads these paths from the environment.
 export LIBERO_ROOT
 export BASE_CHECKPOINT_PATH
 export WAN_VAE_PATH