Add apply_chat_template renderer + tito_calc demo by kashif · Pull Request #742 · thinking-machines-lab/tinker-cookbook

kashif · 2026-05-28T12:48:01Z

Summary

Adds a model-agnostic TitoRenderer to tinker_cookbook/renderers/, registered under the name apply_chat_template, plus a tiny demo recipe (recipes/tito_calc/) that runs math_rl arithmetic twice — once with the cookbook's hand-coded Llama3Renderer, once with apply_chat_template — and prints a per-step parity diff.

The new renderer delegates build_generation_prompt to tokenizer.apply_chat_template. It's ~30 lines of generic code that works for any chat-tuned model whose template is prefix-preserving for tool messages (most modern open-weights families). No per-family Python; the family-specific bits live in the model's Jinja chat template, which is what every inference engine renders at deploy time.

Result on Tinker

meta-llama/Llama-3.1-8B-Instruct, math_rl arithmetic, 5 steps:

step	renderer	env/all/correct	reward/total	kl_sample_train_v1	entropy
0	`llama3`	0.738	0.733	0.001541	0.364
0	`apply_chat_template`	0.238	0.211	-0.003513	0.241
1	`llama3`	0.988	0.988	-0.000319	0.042
1	`apply_chat_template`	1.000	1.000	0.000102	0.002
2–4	both	1.000	1.000	~0	~0

Both arms converge to 100% reward by step 2. kl_sample_train_v1 stays small in both. The step-0 gap reflects the byte difference between the cookbook's simplified rendering and apply_chat_template (which includes Llama 3's Cutting Knowledge Date: preamble); the policy adapts in one step.

What's in this PR

tinker_cookbook/renderers/apply_chat_template.py — new TitoRenderer class (one file, ~50 LOC)
tinker_cookbook/renderers/__init__.py — one new elif branch + docstring entry
tinker_cookbook/recipes/tito_calc/ — demo recipe (driver + README)

Net diff: +220 / −711.

Test plan

get_renderer("apply_chat_template", tok) resolves to TitoRenderer
TitoRenderer.build_generation_prompt(...) returns a ModelInput
End-to-end Tinker training run on meta-llama/Llama-3.1-8B-Instruct completes for both renderer_name=llama3 and renderer_name=apply_chat_template, with both arms converging to 100% reward by step 2

…hat_template A tiny tool-calling recipe that runs the cookbook's Renderer.build_supervised_example and tok.apply_chat_template side-by-side over the same canonical multi-turn rollout. Empirical finding: across every supported family the cookbook ships a renderer for (Llama 3, Qwen3 variants, DeepSeek-V3, GPT-OSS), the renderer produces different tokens than the model's HF chat template. The cookbook renderers are simplified Python ports of the templates and shed decorations the HF templates inject (Cutting Knowledge preamble, empty <think> blocks, Harmony channel headers, etc.). For unsupported models (SmolLM3, Laguna XS.2) get_renderer raises; apply_chat_template + return_assistant_tokens_mask=True just works. README explains why this matters (the HF template is the in-distribution rendering — what the model was trained on and what production renders), ties to the TITO blog (§6 property test, §7 renderer-as-fork, §8.2 train/ deploy divergence, §9 right primitive), and notes the cookbook's own AGENTS.md 5x+ KL warning about off-template prompts.

Adds a template_kwargs: dict | None parameter to build_via_tito and prefix_preserved. The dict's contents are forwarded as kwargs to apply_chat_template and become Jinja-context variables — so callers can toggle template-level knobs like Laguna's render_assistant_messages_raw, Qwen3's enable_thinking, GPT-OSS's reasoning_effort, etc. Docstring documents which divergences these kwargs can and cannot close (e.g. Qwen3's last-turn empty <think> block has no kwarg gate).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 585532ac0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-28T12:51:31Z

+    def render_message(self, message: Message, ctx: RenderContext) -> RenderedMessage:
+        # ABC requirement. The cookbook's RL loop never calls this for us —
+        # ``build_generation_prompt`` is the only rendering path we expose.
+        raise NotImplementedError("TitoRenderer renders via apply_chat_template only")


Implement supervised rendering for apply_chat_template

When renderer_name="apply_chat_template" is used outside the RL rollout path, supervised data creation fails: conversation_to_datum() calls renderer.build_supervised_example(...), the base implementation iterates through messages and calls render_message, and this implementation always raises. Since the new renderer is registered through get_renderer as a normal renderer, SFT/data-viz/VLM-classifier paths that accept renderer names will crash rather than producing datums; either implement build_supervised_example from apply_chat_template output or keep this renderer out of supervised-capable paths.

Useful? React with 👍 / 👎.

kashif added 3 commits May 28, 2026 12:19

Add apply_chat_template renderer + tito_calc demo

585532a

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

kashif added 4 commits May 28, 2026 15:12

Implement build_supervised_example; align style with llama3.py

0d01bc0

Fix pyright errors in apply_chat_template renderer

b140d15

Trust tokenizer.eos_token_id, drop defensive check

1a15215

Retrigger CI

85215a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add apply_chat_template renderer + tito_calc demo#742

Add apply_chat_template renderer + tito_calc demo#742
kashif wants to merge 7 commits into
thinking-machines-lab:mainfrom
kashif:tito-calc-recipe

kashif commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kashif commented May 28, 2026

Summary

Result on Tinker

What's in this PR

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant