Echo by snimu · Pull Request #2677 · PrimeIntellect-ai/prime-rl

snimu · 2026-05-31T11:33:03Z

Introduce the proper echo configs

Picks up configs/private 03fd21f — drops the old forth-lang TOMLs (which were running against the silent SFT-gradient bug fixed in the previous commit), and adds: - forth-lang: qwen-rl + 4 cmb-code cells (one per normalizer) + sweep/ - deepdive: qwen-rl + 4 cmb cells (no tool_names filter) + sweep/ - alpha_sweep_launcher.sh: 7 αs × 4 norms × 2 envs = 56 runs, paired with this branch's loss fix so the runs actually exercise SFT. Topology: 2 nodes / bs=256 throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t test ``.cuda()`` after ``requires_grad=True`` makes the device tensor a non-leaf, so ``.grad`` never populates after backward(). Switch to ``device="cuda"`` in the constructor so it's a leaf on the GPU. Caught when running the test on the cluster against the gradient fix.

prompt_attribution arrives at the orchestrator as a plain dict through the verifiers env-server JSON boundary, even though the renderer emits a RenderedTokens object. The previous code assumed attribute access (``prompt_attribution.message_indices``), which crashed on the cluster: AttributeError: 'dict' object has no attribute 'message_indices' Handle both forms — getattr fallback for the object case, dict.get for the wire-serialized case. Per the function's existing precondition (no runtime type check on ``prompt_attribution: Any``), this is the right contract: the SFT mask should be computable from whatever shape the caller passed in as long as the data is present. Caught when the cluster smoke test crashed at step 0 rollout generation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prompt_attribution always arrives as a plain dict through the verifiers env-server JSON boundary — the renderer's RenderedTokens object loses its identity in serialization. Drop the prior defensive isinstance/ getattr branching and just use dict access. Updates the type annotation on the parameter to ``dict | None`` to match.

The previous design flipped SFT positions into ``loss_mask`` and routed everything through ``default_loss_fn``. That had two problems: 1. Double normalization on SFT. SFT positions were divided by the per- rollout mode weight (α/X, set in ``prepare_sample``) AND by the batch-level ``loss_scale`` (= total loss_mask tokens, now including SFT). The per-SFT-token gradient ended up at α/(X × loss_scale), an ~S-times smaller magnitude than the legacy ``sft_loss_weight / N_rl`` regime the production α=0.5 was tuned against. 2. Gradient flow on SFT positions went through ``torch.where(sft_mask, zeros, log_importance_ratio)``, which blocks the gradient through ``trainer_logprobs`` on the True branch. The intent (force IS-ratio to 1 for off-policy correction skip) silently disabled the SFT signal entirely. Restructure to match the shape ECHO's algorithm actually has: - RL term: batch-level normalization by N_rl_batch (the existing ``loss_scale``, now RL-only again). - SFT term: per-rollout normalization by the chosen mode (α, α/S, α/T, α×R/S — set in ``prepare_sample`` on the advantage tensor). No batch normalization on top. Implementation: - ``prepare_sample``: stop flipping SFT into ``loss_mask``. Keep setting ``advantage[sft_pos] = mode_weight``. - ``LossInputs.sft_mask`` removed — no longer needed since SFT and RL flow through separate ``compute_loss`` calls. - ``default_loss_fn``: RL-only. All ``sft_mask``-conditional branches dropped. - New ``sft_pg_loss_fn``: ``loss = -(advantages * trainer_logprobs * loss_mask).sum()``. Caller passes the SFT mask as ``loss_mask`` and ``loss_scale=1`` so no batch division. - ``train.py``: second ``compute_loss`` call after the RL one, sharing the same ``out["logprobs"]`` tensor — single forward + single backward covers both terms. - Regression test: replaced the broken ``test_default_loss_fn_sft_mask_ gradient_flows`` with ``test_sft_pg_loss_fn_gradient_magnitude`` that exercises the new function directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three follow-ups from review of the SFT/RL split: 1. prepare_sample now clears sft_mask to None when sft_alpha is unset. Previously sft_mask propagated downstream regardless, and train.py would run sft_pg_loss_fn over the rollout's raw RL advantage on tool tokens — reward-shaped, not SFT-direction. With the mask cleared, the trainer skips the SFT compute_loss call as intended. 2. _step_sft_mask now bails to all-False when prompt_attribution lacks "message_indices" or "is_content" keys, matching the docstring's "missing/partial attribution returns all-False" contract. The DefaultRenderer leaves these unpopulated; callers shouldn't have to special-case the no-attribution path. 3. test_batch.py fixtures updated for the new design: SFT prompt tokens stay out of loss_mask (= [False, False, True, True] for the 2-prompt + 2-completion fixture). Added an assertion that prepare_sample clears sft_mask to None when sft_alpha is missing. Plus a docstring update in batch.py clarifying that "all_tokens" mode (α/T) is NOT exactly ECHO — the paper uses Z=|𝒪|, which matches "sft_tokens" when no tool_names filter is set. "all_tokens" with T (prompt + completion) is strictly stricter.

…dling" This reverts commit 6d831cd.

…lization" This reverts commit bc01f94.

Per team decision: SFT-on-tool-body is RL credit assignment with a flat positive advantage on tool body positions. Drops the four-mode normalization machinery entirely: - SFTConfig: removed ``normalization`` field. Keep on_tool_outputs, alpha (default 1.0), tool_names. - TrainingSample: removed sft_normalization field. - prepare_sample: collapsed the 4-mode dispatch to just ``advantages[k] = sft_alpha`` on SFT-mask positions. - orchestrator.py: drop sft_normalization assignment. - train.py: drop the stale doc comment about per-rollout mode dispatch. - tests: replaced 4-mode parametric tests with one ``advantage = alpha`` assertion. default_loss_fn keeps the torch.where SFT branch (advantage * trainer_logp on SFT positions, log_IR forced to 0 for KL = 0 on SFT positions) — the mechanism is correct under the RL-credit-assignment framing, just no longer dispatched over four modes. Submodule bump picks up the matching configs/private cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ilter)

…rth-env) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls in renderers @ 5f653df, which adds `RENDERERS_MAX_PROMPT_LEN` as an env-var override for the pre-flight overflow check in `_resolve_max_prompt_len`. Lets us re-enable the pre-flight on clusters whose vllm-router has a broken `/v1/models` handler. To activate at run time, set `RENDERERS_MAX_PROMPT_LEN=32768` (or the engine's actual `max_model_len`) in prime-rl's `.env` before launching. With it set, overlong rollouts raise `OverlongPromptError` cleanly (caught by `MultiTurnEnv.prompt_too_long`) instead of hitting vLLM and crashing the orchestrator with a raw `ValueError`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves 12 textual conflicts across the SFT-on-tool-body data path. The feature's loss math is preserved bit-identical; only the surrounding plumbing changed. Trainer-side reconciliation: - transport/types.py: drop sft_loss bool (replaced by main's training_mode literal); keep sft_mask + sft_alpha. MicroBatch carries sft_mask alongside main's new training_mode + rewards. - batch.py: prepare_sample drops gone pixel_values/image_grid_thw/sft_loss (replaced by mm_kwargs + training_mode). packed_samples_into_micro_bs combines main's existing_len capture with the sft_mask extension. pad_micro_batch keeps both sft_mask (False) and routed_experts padding. - rl/data.py: replace sft_loss with training_mode in FakeDataLoader paths; preserve sft_mask tensor conversion alongside main's new RoutedExperts struct path. - rl/loss.py: drop teacher_logprobs handling from default_loss_fn (moved to a dedicated opd_loss_fn on main). Fix auto-merge bug that placed the sft_nll metrics block inside opd_loss_fn where sft_mask is undefined — moved back to default_loss_fn's tail. compute_loss drops sft_loss bool (replaced by training_mode); keeps sft_mask threading. - rl/train.py: swap sft_loss=micro_batch["sft_loss"] for training_mode=micro_batch["training_mode"]; keep sft_mask_split. Orchestrator-side reconciliation: - configs/orchestrator.py: dedupe state_columns (both sides added it). Reformat SFTConfig + new sft: field to match main's docstring-after- field style. - orchestrator/orchestrator.py: preserve the process_rollout helper, drop now-gone vlm_cache/cache_key args from interleave_rollout call. Combine SFT alpha attachment with sample.training_mode assignment; drop the no-longer-existing config.use_sft_loss path. - orchestrator/trajectories.py: prepare_step_tokens decodes routed_experts (b64 → numpy uint8) and carries multi_modal_data alongside the per-step sft_mask. make_sample uses main's deferred-routed_experts pattern with sft_mask appended. extend_sample keeps the sft_mask extension AND main's chunk-list routed_experts state machine. Tests: - test_batch.py: take main's renamed test_prepare_batch_does_not_pack_ mixed_training_mode (was ..._sft_loss); keep the 3 SFT overlay tests. - test_trajectories.py: drop imports of gone helpers (build_vlm_image_ cache, _ImageStore, _extract_images_from_*); keep _step_sft_mask; add main's align_routed_experts. Append the 8 _step_sft_mask tests after main's renderer mm_kwargs assertion test. - test_loss.py: update SFT gradient regression test to use the new setup_loss_fns (plural) + loss_fns= dispatch API. Submodules: - configs/private kept at HEAD (a7dc3d6) for the SFT sweep cells. - deps/verifiers kept at HEAD (cbc191d2) on sebastian/prompt-tool-names- 2026-05-19. Critical: main's pointer doesn't have prompt_attribution or prompt_message_tool_names merged yet, accepting it would silently disable the feature. The .gitmodules branch pin survived intact. Behavioral impact for SFT-on-tool-body sweeps (training_mode="rl", no teacher): zero change. The only loss-math drift is that default_loss_fn no longer mixes teacher_tau * teacher_kl.detach() into advantages — that path moved to a dedicated opd_loss_fn selected via training_mode="opd" with hardcoded knobs (not a drop-in mix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the env packages used by the SFT-on-tool-outputs experiments (forth-lang, general-agent, wordle) as workspace members + envs extras so they're installed by `uv sync --all-extras` and survive re-syncs. Previously each run's `pre_run_command` did `uv pip install -e ...` to re-add them after `uv sync` wiped them — that created a venv-mutation race window with concurrent jobs sharing this venv. Also pins fastokens via exclude-newer-package overrides for the 2026-05-17 release (>0.2.0) that's still inside the 7-day exclude-newer window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three coordinated changes post-merge: 1. deps/renderers: 5f653df (sebastian/preflight-max-prompt-len-env-override) → 1831bcc (renderers main). The `RENDERERS_MAX_PROMPT_LEN` env-var override workaround is no longer needed now that vllm-router 0.1.25 fixes its `/v1/models` handler; pre-flight auto-discovery works again with the new router. Keeping the feature branch on the renderers repo for reference but un-using it here. 2. pyproject.toml: remove the duplicate `fastokens = false` line that tripped uv sync after the main-merge auto-resolve. The remaining entry (with the renderers-fastokens-override comment) is kept. 3. configs/private: bump to 2a42e1d, which contains the `rollouts_per_example → group_size` rename across all configs to match upstream PR #2599, plus the new deepdive 1000-step prod cells for the α-ablation winners and matching sweep cells for α∈{0.01, 0.05}. uv.lock regenerated to reflect router 0.1.25 + the new prime-rl 0.5.0 pinned vLLM wheel + prime-pydantic-config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops forth-lang, general-agent, and wordle from `envs` extras, workspace members, and `tool.uv.sources`. Those packages live outside the prime-rl repo (forth-lang) or are project-specific to the SFT-on-tool-outputs experiments rather than generally useful; pinning them in prime-rl's pyproject broke CI (`uv sync --all-extras --locked` can't resolve the absolute path source for forth-lang on a non-cluster checkout). Install responsibility now lives in `configs/private`: each experiment cell's `start.sh` does `uv pip install -e <env_paths>` on the launching machine and each `prod.toml`'s `[slurm].pre_run_command` does the same on the compute head node, after the sbatch's own `uv sync`. deepdive stays in prime-rl (already a workspace member upstream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Prevents extraneous packages (those installed via \`uv pip install -e\` outside the lockfile) from being uninstalled on each new compute job's sbatch startup. The shared NFS venv was being briefly stripped of forth-lang/general-agent/wordle on every job startup, which crashed other running jobs' Python subprocesses that re-imported in that window. With --inexact, uv sync preserves packages not tracked by the workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolve conflicts so the diff over main is echo-only. v2 (#2639) is now on main, so take main for v2-drift (advantage/dispatcher/types, Kind→RolloutKind rename, SFT cache_salt #2687) and packaging. Align to main's config hard-break — drop the filters→post_batch_filters alias and the [orchestrator.buffer] FutureWarning shim (echo filter machinery untouched). verifiers stays at 67f958538 (ours) per request; uv.lock reflects that (no nemogym extra). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

#2684 (verifiers as an editable path dep, not a workspace member) is now on this branch, so verifiers' nemogym extra-vs-extra conflicts no longer break `uv sync --all-extras` — it only enables prime-rl's own extras. Verified --all-extras resolves cleanly with verifiers main. 67f958538 -> 7f5109e4 picks up: v1 endpoint port pre-probe race fix (#1513), SandboxConfig.labels via #1512 (supersedes local hotfix 76afcac), NeMo Gym integration (#1396), unbounded v1 max_turns (#1517). uv.lock restores verifiers' nemogym/tasksets extra metadata. renderers was already at newest main (942449c37 / v0.1.8.dev39); only synced the working tree, which had drifted to the older v0.1.8.dev37 tag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Collapse the per-role / per-scenario _step_echo_alpha and apply_echo_filter tests into parametrized tests (echo tests 50 -> 16 functions, ~-320 lines) with identical coverage. Merge the 3 prepare_sample echo-overlay batch tests into one parametrized test (and fix an `is True` identity check). Drop 4 pydantic-default framework-glue tests (kwargs default/explicit, required import_path, filter nesting) — the real kwargs=None contract stays covered; keep the custom at-least-one-role validator. Drop the vestigial echo_tokens-mode framing in the loss gradient test. Could not run pytest locally (darwin venv can't import prime_rl); verified via py_compile + ruff. Run on the remote before relying on it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Cut the per-role / EchoConfig / EchoFilterConfig docstrings and the echo comment blocks in transport/types, batch, loss, and trajectories down to essentials (keeping loss.py's gradient-flow rationale). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Roles default to None (disabled); when one is activated it should contribute gradient out of the box, so per-role alpha now defaults to 1.0 instead of 0.0 (which required also setting a positive alpha to do anything). The alpha=0 "kill RL on assistant tokens" case stays available explicitly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The echo PR carries no dependency changes — it uses main's pins for verifiers (e1d4f2593), renderers (35c2407e9), research-environments (c752781), configs/private (70c3503e1), plus main's pyproject/uv.lock. Reverts the earlier verifiers bump (7f5109e4). Private configs aren't used in this PR (experiments run elsewhere), so main's stuff is fine; issues to be reported separately (see prime-rl-research-env-bump-issue.md). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop a gratuitous reword introduced during the echo doc-trim pass — only the body's echo-filter wiring should differ from main, not the docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ho_config Bind the echo-filter kwargs into the resolved callable via functools.partial in TrainEnv setup, so process_rollout no longer threads kwargs back through env.config.echo. That removes the type-narrowing assert and the env_echo_config alias (echo_config is now inlined into the interleave_rollout call), and apply_echo_filter drops its filter_kwargs param. Removed test_apply_echo_filter_kwargs — kwargs binding is now functools.partial + the EchoFilterConfig.kwargs default, not apply_echo_filter's concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the in-place "null out where the mask is False" loop in _step_echo_alpha with an equivalent select-comprehension (keep the baseline alpha where the mask is True, None where False). No behavior change — a True still preserves a None baseline, so the filter only narrows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

apply_echo_filter already validates the user filter's output (outer length vs trajectory, inner length vs each step's tokens, bool types), so the downstream re-checks were redundant: drop the outer-length check in interleave_rollout and the inner-length check in _step_echo_alpha. The narrowing zip uses strict=True so an internal misalignment still fails loud instead of silently truncating. Removed the two now-dead length-mismatch tests; the real validation stays covered by test_apply_echo_filter_invalid_raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The function builds the per-token echo_alpha array for one trajectory step; the verb-prefixed name reads as a builder (and keeps the echo_alpha field name) rather than looking like a scalar. Updates the def, the interleave_rollout call, the :func: ref, the transport/types comment, and the test import + test names + calls + docstrings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ases Cleanup pass over the per-role echo feature: simplifications, several correctness fixes, and restored test coverage. Loss: - Decouple echo cross-entropy from default_loss_fn via split_loss_masks (RL mask = loss_mask & ~echo_mask, echo mask = loss_mask & echo_mask). - Normalize RL and echo terms by separate global denominators (rl_loss_scale / echo_loss_scale) so echoing prompt/tool tokens no longer dilutes ordinary RL completion gradients. RL-only runs are numerically unchanged (rl_loss_scale == old loss_scale, echo term zero). - Reject echo with a custom RL loss and in opd/sft modes (explicit errors). Fixes: - Exclude token 0 from echo: its shifted logprob is a placeholder, not a valid CE target, so it must not be supervised. - Validate echo_alpha length in prepare_sample and reject misaligned samples in the packer. - Restore the train_sink echo-filter guard (skip empty-trajectory rollouts) so a configured filter can't crash on degenerate rollouts; keep running the filter when prompt_attribution is absent. - tool_names: list -> set with min_length>=1 (empty allow-list rejected; None still means all tools); alpha gets allow_inf_nan=False. - token_export: split RL/echo masks, export echo_mask + rl_loss_mask, and mask RL diagnostics to RL positions. Cleanup: - Flatten _build_step_echo_alpha; add _extend_optional_token_field helper; trim verbose echo docstrings/comments. - Exclude echo positions from RL metrics and entropy/mismatch_kl logging. Tests: - Consolidate echo unit tests; re-add per-role/content baseline cases, mixed-roles, filter outer-length validation, and filter exception / empty-trajectory coverage; add a token-export test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

merge in main

The cleanup commit reworded EchoConfig's role validator to "requires at least one of system, user, assistant, or tool" but left the test asserting match="at least one role", so test_echo_config_rejects_filter_without_role failed in CI (the rejection itself is correct). Match "at least one of". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r-role-echo

…p run) Temporary experiment pin to launch the alpha=0.005 forth-lang run. Revert to the main-aligned configs/private pin before the echo PR merges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Picks up the InterceptionServer port-restart race fix (#1526) plus current main: Scale-SWE taskset, v1 sandbox runtime lifecycle, unbounded max_turns, multi-SWE patch extraction, SWE taskset reorg. verifiers is an editable path dep, so this is the submodule pointer only — run `uv sync --all-extras` on the remote to pick up any shifted verifiers deps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…air_rl cadence align) Picks up 4e7aebf in research-configs: migrates [inference.kv_cache_offload] from flat cpu_bytes to the new tiered/discriminated shape (PR #2689) on the three forth-lang cells in active use (cmb_code_a0p005, rl_muon, rl), and aligns glm45air_rl ckpt+eval cadence with the muon cell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings deps/research-environments up to current main (was 23 commits behind), including the forth-lang env (#358) used by the current runs, websearch query fixes, and the legacy RLM environment removal (#426). Submodule pointer only; run `uv sync --all-extras` on the remote to re-resolve workspace members. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nments #426) research-environments main (b07ace376) deleted this env in #426, so its pyproject refs (envs extra + [tool.uv.sources]) must go after the bump. uv.lock still carries stale entries — regenerate with `uv lock` / `uv sync --all-extras` on the remote before any --locked use. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

research-environments main routes envs through `verifiers[packages]`, which pulls verifiers' internal editable path-packages `harnesses` and `tasksets`. uv requires transitively-pulled path deps to be direct requirements of the workspace root, so declare both (envs extra + [tool.uv.sources]). uv.lock regen on the remote; if tasksets' [openenv,openreward,ta] extras drag in further path deps, `uv lock` will name them next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

snimu and others added 30 commits May 22, 2026 16:30

chore(configs): bump configs/private (rpx=16 for ECHO parity)

04d995f

chore(configs): bump configs/private (start.sh-style sweep wrapper)

1bd5ff3

chore(configs): bump configs/private (PATH fix for non-login SSH)

565ea6b

Revert "fix(sft): clear sft_mask without sft_alpha + missing-attr han…

9cde634

…dling" This reverts commit 6d831cd.

Revert "fix(loss): split SFT and RL loss paths to remove double norma…

acc3365

…lization" This reverts commit bc01f94.

chore(configs): bump configs/private (none-only sweep launcher)

8b7ab3d

chore(configs): bump configs/private (3 nodes/run, 2 infer replicas)

767df36

chore: remove accidentally committed .DS_Store

4a72045

chore(gitignore): ignore .DS_Store

a18d339

chore(configs): bump configs/private (launcher polls squeue)

76cd355

chore(configs): bump configs/private (deepdive drops zero_advantage f…

671657f

…ilter)

chore(configs): bump configs/private (eval=100, cmb-all variant)

87e8621

chore(configs): bump configs/private (forth-lang prod cells)

c7f3540

chore(configs): bump configs/private (dd sweep cells dd_a0p1 + dd_a1p0)

fb605d0

chore(deps): bump research-environments (merge main into sebastian/fo…

bb5c179

…rth-env) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(configs): bump configs/private for env install ownership

4514139

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(configs): bump configs/private for deepdive launcher env install

1f65ec8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

snimu and others added 6 commits June 1, 2026 19:48

chore(configs): bump configs/private (glm45air muon ckpt.keep_last 1→2)

2cc912c

snimu force-pushed the feat/per-role-echo branch from 52bee52 to 418e264 Compare June 2, 2026 11:45

snimu and others added 23 commits June 2, 2026 11:50

uv lock

550c6bf

chore(train_sink): restore main's process_rollout docstring

a6612fa

Drop a gratuitous reword introduced during the echo doc-trim pass — only the body's echo-filter wiring should differ from main, not the docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/per-role-echo

c6e9308

merge in main

cleanup

c732d9b

cleanup token export float conversion

f2f3ecf

refactor echo annotations

e00ede3

ruff

b8aa41e

rename echo nll metric

cd2f98d

Merge remote-tracking branch 'origin/feat/per-role-echo' into feat/pe…

83b1903

…r-role-echo

uv lock

77069ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Echo#2677

Echo#2677
snimu wants to merge 265 commits into
mainfrom
feat/per-role-echo

snimu commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snimu commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants