feat(renderer): expose preserve_*_thinking flags in RendererConfig by hallerite · Pull Request #2436 · PrimeIntellect-ai/prime-rl

hallerite · 2026-05-07T18:01:55Z

Summary

Re-applies #2433 onto main. The original PR landed against feat/unify-inference-generate and was merged there by mistake.

Adds two new fields to RendererConfig:

preserve_all_thinking: bool = False
preserve_thinking_between_tool_calls: bool = False

The orchestrator forwards them in two places so train and infer stay byte-consistent:

create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids for trajectory tokenization.
setup_inference_pool() → setup_clients() → vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour.

Both flags are off by default → zero behaviour change for existing configs. validate_renderer_args rejects either flag when orchestrator.use_renderer=False, matching the existing renderer knobs.

Per-flag semantics

preserve_all_thinking — emit reasoning_content for every past-assistant turn, even those before another user message.
preserve_thinking_between_tool_calls — emit reasoning_content only for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset of preserve_all_thinking.

Example config

[orchestrator.renderer]
name = "glm5"
preserve_all_thinking = true

Dependency bumps

verifiers 3b77145 → a7516a1 (PR Run dir? #1298: ClientConfig propagation, plus 15 intervening commits including the renderers split out of the verifiers monorepo).
renderers is now a direct dep of prime-rl pinned to fe67f9f (PR Data streaming #4: construction-time preserve flags). Previously pulled in transitively from the verifiers packages/renderers subdirectory.

File-path delta vs #2433

The original PR targeted feat/unify-inference-generate where configs live under src/prime_rl/configs/. On main they live under packages/prime-rl-configs/src/prime_rl/configs/ (workspace-package split landed in #2416). The diff is otherwise identical.

🤖 Generated with Claude Code

Adds two new fields to RendererConfig: - preserve_all_thinking - preserve_thinking_between_tool_calls The orchestrator forwards them in two places so train and infer stay consistent: 1. create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids. 2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour. Both flags are off by default → zero behaviour change for existing configs. Setting either without orchestrator.use_renderer=True is rejected by validate_renderer_args, matching the existing renderer knobs. Bumps source pins: - verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation) - renderers (now a separate repo): pinned to fe67f9f (PR #4: construction-time preserve flags) Re-applies #2433, which was merged into feat/unify-inference-generate by mistake. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
hallerite wants to merge 1 commit intomainfrom
feat/preserve-thinking-flags-main

hallerite commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented May 7, 2026

Summary

Per-flag semantics

Example config

Dependency bumps

File-path delta vs #2433

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant