Skip to content

Add apply_chat_template renderer + tito_calc demo#742

Open
kashif wants to merge 7 commits into
thinking-machines-lab:mainfrom
kashif:tito-calc-recipe
Open

Add apply_chat_template renderer + tito_calc demo#742
kashif wants to merge 7 commits into
thinking-machines-lab:mainfrom
kashif:tito-calc-recipe

Conversation

@kashif

@kashif kashif commented May 28, 2026

Copy link
Copy Markdown

Summary

Adds a model-agnostic TitoRenderer to tinker_cookbook/renderers/, registered under the name apply_chat_template, plus a tiny demo recipe (recipes/tito_calc/) that runs math_rl arithmetic twice — once with the cookbook's hand-coded Llama3Renderer, once with apply_chat_template — and prints a per-step parity diff.

The new renderer delegates build_generation_prompt to tokenizer.apply_chat_template. It's ~30 lines of generic code that works for any chat-tuned model whose template is prefix-preserving for tool messages (most modern open-weights families). No per-family Python; the family-specific bits live in the model's Jinja chat template, which is what every inference engine renders at deploy time.

Result on Tinker

meta-llama/Llama-3.1-8B-Instruct, math_rl arithmetic, 5 steps:

step renderer env/all/correct reward/total kl_sample_train_v1 entropy
0 llama3 0.738 0.733 0.001541 0.364
0 apply_chat_template 0.238 0.211 -0.003513 0.241
1 llama3 0.988 0.988 -0.000319 0.042
1 apply_chat_template 1.000 1.000 0.000102 0.002
2–4 both 1.000 1.000 ~0 ~0

Both arms converge to 100% reward by step 2. kl_sample_train_v1 stays small in both. The step-0 gap reflects the byte difference between the cookbook's simplified rendering and apply_chat_template (which includes Llama 3's Cutting Knowledge Date: preamble); the policy adapts in one step.

What's in this PR

  • tinker_cookbook/renderers/apply_chat_template.py — new TitoRenderer class (one file, ~50 LOC)
  • tinker_cookbook/renderers/__init__.py — one new elif branch + docstring entry
  • tinker_cookbook/recipes/tito_calc/ — demo recipe (driver + README)

Net diff: +220 / −711.

Test plan

  • get_renderer("apply_chat_template", tok) resolves to TitoRenderer
  • TitoRenderer.build_generation_prompt(...) returns a ModelInput
  • End-to-end Tinker training run on meta-llama/Llama-3.1-8B-Instruct completes for both renderer_name=llama3 and renderer_name=apply_chat_template, with both arms converging to 100% reward by step 2

kashif added 3 commits May 28, 2026 12:19
…hat_template

A tiny tool-calling recipe that runs the cookbook's Renderer.build_supervised_example
and tok.apply_chat_template side-by-side over the same canonical multi-turn rollout.

Empirical finding: across every supported family the cookbook ships a renderer
for (Llama 3, Qwen3 variants, DeepSeek-V3, GPT-OSS), the renderer produces
different tokens than the model's HF chat template. The cookbook renderers are
simplified Python ports of the templates and shed decorations the HF templates
inject (Cutting Knowledge preamble, empty <think> blocks, Harmony channel
headers, etc.). For unsupported models (SmolLM3, Laguna XS.2) get_renderer
raises; apply_chat_template + return_assistant_tokens_mask=True just works.

README explains why this matters (the HF template is the in-distribution
rendering — what the model was trained on and what production renders),
ties to the TITO blog (§6 property test, §7 renderer-as-fork, §8.2 train/
deploy divergence, §9 right primitive), and notes the cookbook's own
AGENTS.md 5x+ KL warning about off-template prompts.
Adds a template_kwargs: dict | None parameter to build_via_tito and
prefix_preserved. The dict's contents are forwarded as kwargs to
apply_chat_template and become Jinja-context variables — so callers can
toggle template-level knobs like Laguna's render_assistant_messages_raw,
Qwen3's enable_thinking, GPT-OSS's reasoning_effort, etc.

Docstring documents which divergences these kwargs can and cannot close
(e.g. Qwen3's last-turn empty <think> block has no kwarg gate).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 585532ac0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +45 to +48
def render_message(self, message: Message, ctx: RenderContext) -> RenderedMessage:
# ABC requirement. The cookbook's RL loop never calls this for us —
# ``build_generation_prompt`` is the only rendering path we expose.
raise NotImplementedError("TitoRenderer renders via apply_chat_template only")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Implement supervised rendering for apply_chat_template

When renderer_name="apply_chat_template" is used outside the RL rollout path, supervised data creation fails: conversation_to_datum() calls renderer.build_supervised_example(...), the base implementation iterates through messages and calls render_message, and this implementation always raises. Since the new renderer is registered through get_renderer as a normal renderer, SFT/data-viz/VLM-classifier paths that accept renderer names will crash rather than producing datums; either implement build_supervised_example from apply_chat_template output or keep this renderer out of supervised-capable paths.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant