From dc9877f03621e35f4305aaefe4dfda373b196cb5 Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:05:29 +0800 Subject: [PATCH 1/7] Add LIBERO-10 action-policy finetune cookbook Mirror the DROID action-policy cookbook for LIBERO-10: launch_sft_action_policy_libero.sh + action_policy_libero_repro.toml + finetune README section + action README link. Stages nvidia/LIBERO_LeRobot_v3 libero_10, no keep-ranges filter, lr 5e-5/wu500/cyc16k. Co-Authored-By: Claude Opus 4.8 (1M context) --- cookbooks/cosmos3/generator/action/README.md | 5 +- .../generator/action/finetune/README.md | 34 +++++++- .../launch_sft_action_policy_libero.sh | 75 ++++++++++++++++++ .../action_policy_libero_repro.toml | 79 +++++++++++++++++++ 4 files changed, 188 insertions(+), 5 deletions(-) create mode 100755 cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh create mode 100644 cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md index 73760e9d..49169f55 100644 --- a/cookbooks/cosmos3/generator/action/README.md +++ b/cookbooks/cosmos3/generator/action/README.md @@ -88,7 +88,6 @@ visualize the generated videos: inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano. - [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID. - ## Run with vLLM-Omni ### Quickstart @@ -135,7 +134,9 @@ To reproduce our post-training recipe for [Cosmos3-Nano-Policy-DROID](https://hu launch-script pattern as the other Cosmos3 finetune cookbooks while delegating the canonical training implementation to Cosmos Framework. - +The same [action-policy SFT cookbook](./finetune/README.md) also covers **LIBERO-10** +(`launch_sft_action_policy_libero.sh`) — fine-tuning Cosmos3-Nano on the `libero_10` +simulation benchmark with the same launch-script pattern. ## TODO diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index 52436be6..d5661e9a 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -1,12 +1,15 @@ -# Cosmos3-Nano-Policy-DROID Fine-Tuning (SFT) +# Cosmos3-Nano Action-Policy Fine-Tuning (SFT) -This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into an action policy for the DROID robot. It reproduces the post-training recipe used to create [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID), leveraging the public [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) dataset and the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). +This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered: **DROID** (reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID)) and **LIBERO-10** (the simulation benchmark). | Recipe | Launch shell | Base model | Dataset | | --- | --- | --- | --- | | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split | +| Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` | -The recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment. It trains a DROID policy model with `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. +The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. + +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the Table-20 reproduction; the 4-suite mix dilutes libero_10). No keep-ranges filter. Reaches ~95% success on the 500-episode libero_10 closed-loop eval (best ~95.2% @ iter_1500). ## Prerequisites @@ -63,6 +66,31 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo bash launch_sft_action_policy_droid.sh ``` +## LIBERO-10 quick start + +The LIBERO launcher mirrors the DROID one. It stages the `libero_10` suite (auto-downloaded if +missing), downloads the Wan VAE, converts the base checkpoint, and trains — no keep-ranges filter. + +```shell +bash launch_sft_action_policy_libero.sh +``` + +The launcher: + +- downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing +- downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed +- launches 8-GPU training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`) + +Relocate inputs via env vars, or run a short smoke test: + +```shell +export LIBERO_ROOT=/scratch/LIBERO_LeRobot_v3/libero_10 +export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpoint.save_iter=10 dataloader_train.max_samples_per_batch=32" +bash launch_sft_action_policy_libero.sh +``` + +Checkpoints are saved every 500 iters (sweep 500/1000/1500/2000); the peak is typically iter_1500. + ## Outputs Training writes to `outputs/train////`: diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh new file mode 100755 index 00000000..128540cb --- /dev/null +++ b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh @@ -0,0 +1,75 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: OpenMDW-1.1 + +# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (8x H100). +# Run from this folder with the cosmos-framework venv active (see README): +# bash launch_sft_action_policy_libero.sh +# It prepares the small dependencies, checks for the staged libero_10 dataset, and trains. +# Paths are fixed under this (git-ignored) folder, matching the reasoner finetune +# wrappers, while the TOML and tail-overrides match the cosmos-framework example. + +set -euo pipefail +cd "$(dirname "${BASH_SOURCE[0]}")" + +TOML_FILE="toml/sft_config/action_policy_libero_repro.toml" +: "${LIBERO_ROOT:=$PWD/data/LIBERO_LeRobot_v3/libero_10}" +: "${BASE_CHECKPOINT_PATH:=$PWD/checkpoints/Cosmos3-Nano}" +: "${WAN_VAE_PATH:=$PWD/checkpoints/wan22_vae/Wan2.2_VAE.pth}" + +# 1. Stage the libero_10 suite (the Table-20 reproduction trains on libero_10 ALONE). +if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then + echo "Downloading nvidia/LIBERO_LeRobot_v3 (libero_10) ..." + uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \ + --include 'libero_10/**' --local-dir "$(dirname "$LIBERO_ROOT")" +fi +if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then + cat >&2 < sweep 500..2000. +# Best observed: ~95.2% @ iter_1500 (libero_10, 500-ep closed-loop eval), with +# task-0 success stable across the sweep (no over-fit collapse). This gentle-LR +# schedule is more robust than a higher lr (e.g. 1e-4), which peaks near iter_1000 +# then over-fits task 0 and regresses. See docs/action_policy_libero_sft.md. +# +# REPRODUCTION: train on libero_10 ALONE (point LIBERO_ROOT at the libero_10 +# LeRobot conversion only). The 4-suite mix dilutes libero_10 (~1/4 the exposure +# per step) and converges more slowly. +# +# Env required: +# LIBERO_ROOT=/path/to/libero_10_lerobot +# BASE_CHECKPOINT_PATH= +# WAN_VAE_PATH= +# IMAGINAIRE_OUTPUT_ROOT=/path/to/output_root # persist checkpoints +# ============================================================================ + +[job] +task = "vfm" +experiment = "action_policy_libero_nano" +project = "cosmos3_action_libero" +group = "action_sft" +name = "action_policy_libero_repro" +wandb_mode = "online" + +[model] +precision = "bfloat16" +# Cap the packed sequence (GA-validated). Uncapped (-1) packs one very long sequence +# and OOMs even on H200. +max_num_tokens_after_packing = 74000 + +[model.parallelism] +data_parallel_shard_degree = 8 # 1-node 8-GPU shard; raise replicate for multi-node HSDP +data_parallel_replicate_degree = 1 + +[model.activation_checkpointing] +mode = "selective" # GA recipe (full is slower; selective fits 256x512) +save_ops_regex = ["fmha"] + +[model.tokenizer] +vae_path = "${oc.env:WAN_VAE_PATH}" + +[optimizer] +lr = 5.0e-05 # recommended base lr + +[scheduler] +cycle_lengths = [16000] # LR trajectory: warmup 500 -> linear decay over 16k (barely decayed at 2k) +warm_up_steps = [500] + +[trainer] +max_iter = 2000 # pause at 2k; sweep checkpoints 500/1000/1500/2000 for the peak +logging_iter = 50 +grad_accum_iter = 2 # global batch = max_samples_per_batch 128 x DP 8 x grad_accum 2 = 2048 + +[checkpoint] +load_path = "${oc.env:BASE_CHECKPOINT_PATH}" +save_iter = 500 # sweep cadence; peak is typically iter_1500 + +# NOTE (train/serve parity — see GitHub issue NVIDIA/cosmos-framework#50): the +# 256x512 concat_view is snapped to a 192x320 model canvas (resize+reflect-pad), and +# the eval server reproduces the same snap. Run the client with the same 2:1 concat +# (--camera agentview,wrist --image_size 256) so resolution + prompt suffix match, and +# use --action-normalization quantile_rot + the bundled libero rot6d stats on the +# server so denormalization matches training. See docs/action_policy_libero_sft.md. +# +# max_samples_per_batch is 128 in the experiment (256 OOMs: per-forward peak, not grad_accum). +# On lower-memory GPUs reduce at launch: +# --opts dataloader_train.max_samples_per_batch=64 From c5d493c11b0689a2a130562ca5d973f136859b5e Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:24:38 +0800 Subject: [PATCH 2/7] libero cookbook: trim README detail, drop SR numbers, sync recipe toml --- .../generator/action/finetune/README.md | 4 +- .../action_policy_libero_repro.toml | 57 ++++--------------- 2 files changed, 14 insertions(+), 47 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index d5661e9a..f2514eb4 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -9,7 +9,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/ The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. -The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the Table-20 reproduction; the 4-suite mix dilutes libero_10). No keep-ranges filter. Reaches ~95% success on the 500-episode libero_10 closed-loop eval (best ~95.2% @ iter_1500). +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the 4-suite mix dilutes libero_10). No keep-ranges filter. ## Prerequisites @@ -89,7 +89,7 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo bash launch_sft_action_policy_libero.sh ``` -Checkpoints are saved every 500 iters (sweep 500/1000/1500/2000); the peak is typically iter_1500. +Checkpoints are saved every 500 iters; sweep them to pick the best iteration. ## Outputs diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml index 7fd788d3..d74237bc 100644 --- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml +++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml @@ -1,30 +1,10 @@ # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: OpenMDW-1.1 -# ============================================================================ -# LIBERO action-policy SFT — run config for the `action_policy_libero_nano` -# experiment (Cosmos3-Nano LIBERO-10). The recipe knobs (optimizer base, count- -# based batch, action-head skip-on-load, dataset knobs) live in the registered -# experiment; this file sets run-level scalars (lr/schedule, iters, ckpt cadence, -# parallelism shape, wandb, VAE path). -# -# RECIPE (recommended): lr 5e-5, warmup 500, cycle 16000 (so LR is barely decayed -# at iter 2000, ~4.5e-5), global batch 2048, save every 500 -> sweep 500..2000. -# Best observed: ~95.2% @ iter_1500 (libero_10, 500-ep closed-loop eval), with -# task-0 success stable across the sweep (no over-fit collapse). This gentle-LR -# schedule is more robust than a higher lr (e.g. 1e-4), which peaks near iter_1000 -# then over-fits task 0 and regresses. See docs/action_policy_libero_sft.md. -# -# REPRODUCTION: train on libero_10 ALONE (point LIBERO_ROOT at the libero_10 -# LeRobot conversion only). The 4-suite mix dilutes libero_10 (~1/4 the exposure -# per step) and converges more slowly. -# -# Env required: -# LIBERO_ROOT=/path/to/libero_10_lerobot -# BASE_CHECKPOINT_PATH= -# WAN_VAE_PATH= -# IMAGINAIRE_OUTPUT_ROOT=/path/to/output_root # persist checkpoints -# ============================================================================ +# LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano` +# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048). +# Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT. +# See docs/action_policy_libero_sft.md. [job] task = "vfm" @@ -36,44 +16,31 @@ wandb_mode = "online" [model] precision = "bfloat16" -# Cap the packed sequence (GA-validated). Uncapped (-1) packs one very long sequence -# and OOMs even on H200. max_num_tokens_after_packing = 74000 [model.parallelism] -data_parallel_shard_degree = 8 # 1-node 8-GPU shard; raise replicate for multi-node HSDP -data_parallel_replicate_degree = 1 +data_parallel_shard_degree = 8 +data_parallel_replicate_degree = 2 # HSDP 2x8 = 16 ranks (2 nodes) [model.activation_checkpointing] -mode = "selective" # GA recipe (full is slower; selective fits 256x512) +mode = "selective" save_ops_regex = ["fmha"] [model.tokenizer] vae_path = "${oc.env:WAN_VAE_PATH}" [optimizer] -lr = 5.0e-05 # recommended base lr +lr = 5.0e-05 [scheduler] -cycle_lengths = [16000] # LR trajectory: warmup 500 -> linear decay over 16k (barely decayed at 2k) +cycle_lengths = [16000] warm_up_steps = [500] [trainer] -max_iter = 2000 # pause at 2k; sweep checkpoints 500/1000/1500/2000 for the peak +max_iter = 2000 logging_iter = 50 -grad_accum_iter = 2 # global batch = max_samples_per_batch 128 x DP 8 x grad_accum 2 = 2048 +grad_accum_iter = 1 # global batch = 128 x (8 x 2) x 1 = 2048 [checkpoint] load_path = "${oc.env:BASE_CHECKPOINT_PATH}" -save_iter = 500 # sweep cadence; peak is typically iter_1500 - -# NOTE (train/serve parity — see GitHub issue NVIDIA/cosmos-framework#50): the -# 256x512 concat_view is snapped to a 192x320 model canvas (resize+reflect-pad), and -# the eval server reproduces the same snap. Run the client with the same 2:1 concat -# (--camera agentview,wrist --image_size 256) so resolution + prompt suffix match, and -# use --action-normalization quantile_rot + the bundled libero rot6d stats on the -# server so denormalization matches training. See docs/action_policy_libero_sft.md. -# -# max_samples_per_batch is 128 in the experiment (256 OOMs: per-forward peak, not grad_accum). -# On lower-memory GPUs reduce at launch: -# --opts dataloader_train.max_samples_per_batch=64 +save_iter = 500 From 53e4af56d50db9df6caab518f4e5a7e335219003 Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:43:46 +0800 Subject: [PATCH 3/7] libero cookbook: drop droid comparison in libero quick-start --- cookbooks/cosmos3/generator/action/finetune/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index f2514eb4..73f811d8 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -68,8 +68,8 @@ bash launch_sft_action_policy_droid.sh ## LIBERO-10 quick start -The LIBERO launcher mirrors the DROID one. It stages the `libero_10` suite (auto-downloaded if -missing), downloads the Wan VAE, converts the base checkpoint, and trains — no keep-ranges filter. +The LIBERO launcher stages the `libero_10` suite (auto-downloaded if missing), +downloads the Wan VAE, converts the base checkpoint, and trains. ```shell bash launch_sft_action_policy_libero.sh From 93c36d67272cec063c56b129a6e10f2031c48e19 Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:49:34 +0800 Subject: [PATCH 4/7] libero cookbook: sync recipe toml (HSDP 8x8) --- .../toml/sft_config/action_policy_libero_repro.toml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml index d74237bc..63ab0323 100644 --- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml +++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml @@ -2,7 +2,7 @@ # SPDX-License-Identifier: OpenMDW-1.1 # LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano` -# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048). +# experiment. Train on libero_10 alone (HSDP 8x8, global batch 2048). # Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT. # See docs/action_policy_libero_sft.md. @@ -20,7 +20,7 @@ max_num_tokens_after_packing = 74000 [model.parallelism] data_parallel_shard_degree = 8 -data_parallel_replicate_degree = 2 # HSDP 2x8 = 16 ranks (2 nodes) +data_parallel_replicate_degree = 8 # HSDP 8x8 = 64 ranks (8 nodes) [model.activation_checkpointing] mode = "selective" @@ -39,7 +39,7 @@ warm_up_steps = [500] [trainer] max_iter = 2000 logging_iter = 50 -grad_accum_iter = 1 # global batch = 128 x (8 x 2) x 1 = 2048 +grad_accum_iter = 1 # global batch = max_samples 32 x (shard 8 x replicate 8) x 1 = 2048 [checkpoint] load_path = "${oc.env:BASE_CHECKPOINT_PATH}" From d13dbdc902f20af51918818451fc3be0a93cf9d0 Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:53:00 +0800 Subject: [PATCH 5/7] libero cookbook: pair DROID/LIBERO intros (both reproduce Cosmos3 paper results); recipe HSDP 2x8 --- cookbooks/cosmos3/generator/action/finetune/README.md | 7 +++++-- .../toml/sft_config/action_policy_libero_repro.toml | 6 +++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index 73f811d8..97737f2e 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -1,6 +1,9 @@ # Cosmos3-Nano Action-Policy Fine-Tuning (SFT) -This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered: **DROID** (reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID)) and **LIBERO-10** (the simulation benchmark). +This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result: + +- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID) on real-robot manipulation data. +- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 simulation-benchmark results. | Recipe | Launch shell | Base model | Dataset | | --- | --- | --- | --- | @@ -9,7 +12,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/ The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. -The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048. Train on `libero_10` **alone** (the 4-suite mix dilutes libero_10). No keep-ranges filter. +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone** (the all-suites mix dilutes libero_10). No keep-ranges filter. ## Prerequisites diff --git a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml index 63ab0323..a0c49c7a 100644 --- a/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml +++ b/cookbooks/cosmos3/generator/action/finetune/toml/sft_config/action_policy_libero_repro.toml @@ -2,7 +2,7 @@ # SPDX-License-Identifier: OpenMDW-1.1 # LIBERO-10 action-policy SFT run config for the `action_policy_libero_nano` -# experiment. Train on libero_10 alone (HSDP 8x8, global batch 2048). +# experiment. Train on libero_10 alone (HSDP 2x8, global batch 2048). # Env: LIBERO_ROOT, BASE_CHECKPOINT_PATH, WAN_VAE_PATH, IMAGINAIRE_OUTPUT_ROOT. # See docs/action_policy_libero_sft.md. @@ -20,7 +20,7 @@ max_num_tokens_after_packing = 74000 [model.parallelism] data_parallel_shard_degree = 8 -data_parallel_replicate_degree = 8 # HSDP 8x8 = 64 ranks (8 nodes) +data_parallel_replicate_degree = 2 # HSDP 2x8 = 16 ranks (2 nodes); minimum for gbs 2048 at grad_accum 1 [model.activation_checkpointing] mode = "selective" @@ -39,7 +39,7 @@ warm_up_steps = [500] [trainer] max_iter = 2000 logging_iter = 50 -grad_accum_iter = 1 # global batch = max_samples 32 x (shard 8 x replicate 8) x 1 = 2048 +grad_accum_iter = 1 # global batch = max_samples 128 x (shard 8 x replicate 2) x 1 = 2048 [checkpoint] load_path = "${oc.env:BASE_CHECKPOINT_PATH}" From d0924babaedceff050b42825eac36ad19a2c53da Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 21:58:19 +0800 Subject: [PATCH 6/7] libero cookbook: DROID trained-real/eval-RoboLab-sim; drop all-suites mention --- cookbooks/cosmos3/generator/action/finetune/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index 97737f2e..bfb275de 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -2,8 +2,8 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result: -- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID) on real-robot manipulation data. -- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 simulation-benchmark results. +- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID): trained on real-robot DROID data, evaluated on the RoboLab simulation benchmark. +- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 results: trained and evaluated on the LIBERO-10 simulation benchmark. | Recipe | Launch shell | Base model | Dataset | | --- | --- | --- | --- | @@ -12,7 +12,7 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/ The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. -The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone** (the all-suites mix dilutes libero_10). No keep-ranges filter. +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. No keep-ranges filter. ## Prerequisites From dd584704c820026b434d2928a202f2800aadfd07 Mon Sep 17 00:00:00 2001 From: Liang Hao Date: Fri, 26 Jun 2026 22:09:43 +0800 Subject: [PATCH 7/7] libero cookbook: drop [job].task=vfm, '8-GPU', sweep + no-filter mentions --- cookbooks/cosmos3/generator/action/finetune/README.md | 10 +++++----- .../action/finetune/launch_sft_action_policy_libero.sh | 5 +++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index bfb275de..4f5f2bd0 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -10,9 +10,9 @@ This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https:/ | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split | | Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` | -The DROID recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. +The DROID recipe uses the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. -The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. No keep-ranges filter. +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. ## Prerequisites @@ -44,7 +44,7 @@ The launcher is a complete local wrapper for the public cookbook: - downloads `Wan2.2_VAE.pth` if needed - converts `Cosmos3-Nano` to a local DCP checkpoint if needed - downloads `keep_ranges_1_0_1.json` if needed -- launches 8-GPU training with `action_policy_droid_repro.toml` +- launches training with `action_policy_droid_repro.toml` The script intentionally stays close to the `cosmos-framework` example launcher: `DATASET_PATH` is bridged to `DROID_ROOT`, `BASE_CHECKPOINT_PATH` and `WAN_VAE_PATH` are exported for the TOML, @@ -82,7 +82,7 @@ The launcher: - downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing - downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed -- launches 8-GPU training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`) +- launches training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`) Relocate inputs via env vars, or run a short smoke test: @@ -92,7 +92,7 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo bash launch_sft_action_policy_libero.sh ``` -Checkpoints are saved every 500 iters; sweep them to pick the best iteration. +Checkpoints are saved every 500 iters. ## Outputs diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh index 128540cb..48f886e8 100755 --- a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh +++ b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh @@ -2,7 +2,7 @@ # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: OpenMDW-1.1 -# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (8x H100). +# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (HSDP 2x8). # Run from this folder with the cosmos-framework venv active (see README): # bash launch_sft_action_policy_libero.sh # It prepares the small dependencies, checks for the staged libero_10 dataset, and trains. @@ -46,7 +46,8 @@ if [[ ! -d "$BASE_CHECKPOINT_PATH" ]]; then python -m cosmos_framework.scripts.convert_model_to_dcp -o "$BASE_CHECKPOINT_PATH" --checkpoint-path Cosmos3-Nano fi -# 4. Train (8-GPU FSDP by default). The TOML reads these paths from the environment. +# 4. Train (HSDP 2x8 per the TOML; set NNODES/NODE_RANK/MASTER_ADDR per node). +# The TOML reads these paths from the environment. export LIBERO_ROOT export BASE_CHECKPOINT_PATH export WAN_VAE_PATH