Skip to content

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436

Draft
hallerite wants to merge 1 commit intomainfrom
feat/preserve-thinking-flags-main
Draft

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
hallerite wants to merge 1 commit intomainfrom
feat/preserve-thinking-flags-main

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

Re-applies #2433 onto main. The original PR landed against feat/unify-inference-generate and was merged there by mistake.

Adds two new fields to RendererConfig:

  • preserve_all_thinking: bool = False
  • preserve_thinking_between_tool_calls: bool = False

The orchestrator forwards them in two places so train and infer stay byte-consistent:

  1. create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids for trajectory tokenization.
  2. setup_inference_pool()setup_clients()vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour.

Both flags are off by default → zero behaviour change for existing configs. validate_renderer_args rejects either flag when orchestrator.use_renderer=False, matching the existing renderer knobs.

Per-flag semantics

  • preserve_all_thinking — emit reasoning_content for every past-assistant turn, even those before another user message.
  • preserve_thinking_between_tool_calls — emit reasoning_content only for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset of preserve_all_thinking.

Example config

[orchestrator.renderer]
name = "glm5"
preserve_all_thinking = true

Dependency bumps

  • verifiers 3b77145a7516a1 (PR Run dir? #1298: ClientConfig propagation, plus 15 intervening commits including the renderers split out of the verifiers monorepo).
  • renderers is now a direct dep of prime-rl pinned to fe67f9f (PR Data streaming #4: construction-time preserve flags). Previously pulled in transitively from the verifiers packages/renderers subdirectory.

File-path delta vs #2433

The original PR targeted feat/unify-inference-generate where configs live under src/prime_rl/configs/. On main they live under packages/prime-rl-configs/src/prime_rl/configs/ (workspace-package split landed in #2416). The diff is otherwise identical.

🤖 Generated with Claude Code

Adds two new fields to RendererConfig:
  - preserve_all_thinking
  - preserve_thinking_between_tool_calls

The orchestrator forwards them in two places so train and infer stay
consistent:

  1. create_renderer() — bound at construction on the training-side
     renderer used by build_trajectory_step / render_ids.
  2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the
     verifiers RendererClient picks them up and forwards to its
     create_renderer_pool, so every inference render carries the same
     thinking-preservation behaviour.

Both flags are off by default → zero behaviour change for existing
configs. Setting either without orchestrator.use_renderer=True is
rejected by validate_renderer_args, matching the existing renderer
knobs.

Bumps source pins:
  - verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation)
  - renderers (now a separate repo): pinned to fe67f9f (PR #4:
    construction-time preserve flags)

Re-applies #2433, which was merged into feat/unify-inference-generate
by mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant