feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
Draft
feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
Conversation
Adds two new fields to RendererConfig:
- preserve_all_thinking
- preserve_thinking_between_tool_calls
The orchestrator forwards them in two places so train and infer stay
consistent:
1. create_renderer() — bound at construction on the training-side
renderer used by build_trajectory_step / render_ids.
2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the
verifiers RendererClient picks them up and forwards to its
create_renderer_pool, so every inference render carries the same
thinking-preservation behaviour.
Both flags are off by default → zero behaviour change for existing
configs. Setting either without orchestrator.use_renderer=True is
rejected by validate_renderer_args, matching the existing renderer
knobs.
Bumps source pins:
- verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation)
- renderers (now a separate repo): pinned to fe67f9f (PR #4:
construction-time preserve flags)
Re-applies #2433, which was merged into feat/unify-inference-generate
by mistake.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-applies #2433 onto
main. The original PR landed againstfeat/unify-inference-generateand was merged there by mistake.Adds two new fields to
RendererConfig:preserve_all_thinking: bool = Falsepreserve_thinking_between_tool_calls: bool = FalseThe orchestrator forwards them in two places so train and infer stay byte-consistent:
create_renderer()— bound at construction on the training-side renderer used bybuild_trajectory_step/render_idsfor trajectory tokenization.setup_inference_pool()→setup_clients()→vf.ClientConfig— the verifiersRendererClientpicks them up and forwards to itscreate_renderer_pool, so every inference render carries the same thinking-preservation behaviour.Both flags are off by default → zero behaviour change for existing configs.
validate_renderer_argsrejects either flag whenorchestrator.use_renderer=False, matching the existing renderer knobs.Per-flag semantics
preserve_all_thinking— emitreasoning_contentfor every past-assistant turn, even those before another user message.preserve_thinking_between_tool_calls— emitreasoning_contentonly for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset ofpreserve_all_thinking.Example config
Dependency bumps
verifiers3b77145→a7516a1(PR Run dir? #1298: ClientConfig propagation, plus 15 intervening commits including the renderers split out of the verifiers monorepo).renderersis now a direct dep of prime-rl pinned tofe67f9f(PR Data streaming #4: construction-time preserve flags). Previously pulled in transitively from the verifierspackages/rendererssubdirectory.File-path delta vs #2433
The original PR targeted
feat/unify-inference-generatewhere configs live undersrc/prime_rl/configs/. Onmainthey live underpackages/prime-rl-configs/src/prime_rl/configs/(workspace-package split landed in #2416). The diff is otherwise identical.🤖 Generated with Claude Code