Add apply_chat_template renderer + tito_calc demo#742
Conversation
…hat_template A tiny tool-calling recipe that runs the cookbook's Renderer.build_supervised_example and tok.apply_chat_template side-by-side over the same canonical multi-turn rollout. Empirical finding: across every supported family the cookbook ships a renderer for (Llama 3, Qwen3 variants, DeepSeek-V3, GPT-OSS), the renderer produces different tokens than the model's HF chat template. The cookbook renderers are simplified Python ports of the templates and shed decorations the HF templates inject (Cutting Knowledge preamble, empty <think> blocks, Harmony channel headers, etc.). For unsupported models (SmolLM3, Laguna XS.2) get_renderer raises; apply_chat_template + return_assistant_tokens_mask=True just works. README explains why this matters (the HF template is the in-distribution rendering — what the model was trained on and what production renders), ties to the TITO blog (§6 property test, §7 renderer-as-fork, §8.2 train/ deploy divergence, §9 right primitive), and notes the cookbook's own AGENTS.md 5x+ KL warning about off-template prompts.
Adds a template_kwargs: dict | None parameter to build_via_tito and prefix_preserved. The dict's contents are forwarded as kwargs to apply_chat_template and become Jinja-context variables — so callers can toggle template-level knobs like Laguna's render_assistant_messages_raw, Qwen3's enable_thinking, GPT-OSS's reasoning_effort, etc. Docstring documents which divergences these kwargs can and cannot close (e.g. Qwen3's last-turn empty <think> block has no kwarg gate).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 585532ac0d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def render_message(self, message: Message, ctx: RenderContext) -> RenderedMessage: | ||
| # ABC requirement. The cookbook's RL loop never calls this for us — | ||
| # ``build_generation_prompt`` is the only rendering path we expose. | ||
| raise NotImplementedError("TitoRenderer renders via apply_chat_template only") |
There was a problem hiding this comment.
Implement supervised rendering for apply_chat_template
When renderer_name="apply_chat_template" is used outside the RL rollout path, supervised data creation fails: conversation_to_datum() calls renderer.build_supervised_example(...), the base implementation iterates through messages and calls render_message, and this implementation always raises. Since the new renderer is registered through get_renderer as a normal renderer, SFT/data-viz/VLM-classifier paths that accept renderer names will crash rather than producing datums; either implement build_supervised_example from apply_chat_template output or keep this renderer out of supervised-capable paths.
Useful? React with 👍 / 👎.
Summary
Adds a model-agnostic
TitoRenderertotinker_cookbook/renderers/, registered under the nameapply_chat_template, plus a tiny demo recipe (recipes/tito_calc/) that runsmath_rlarithmetic twice — once with the cookbook's hand-codedLlama3Renderer, once withapply_chat_template— and prints a per-step parity diff.The new renderer delegates
build_generation_prompttotokenizer.apply_chat_template. It's ~30 lines of generic code that works for any chat-tuned model whose template is prefix-preserving for tool messages (most modern open-weights families). No per-family Python; the family-specific bits live in the model's Jinja chat template, which is what every inference engine renders at deploy time.Result on Tinker
meta-llama/Llama-3.1-8B-Instruct,math_rlarithmetic, 5 steps:llama3apply_chat_templatellama3apply_chat_templateBoth arms converge to 100% reward by step 2.
kl_sample_train_v1stays small in both. The step-0 gap reflects the byte difference between the cookbook's simplified rendering andapply_chat_template(which includes Llama 3'sCutting Knowledge Date:preamble); the policy adapts in one step.What's in this PR
tinker_cookbook/renderers/apply_chat_template.py— newTitoRendererclass (one file, ~50 LOC)tinker_cookbook/renderers/__init__.py— one newelifbranch + docstring entrytinker_cookbook/recipes/tito_calc/— demo recipe (driver + README)Net diff: +220 / −711.
Test plan
get_renderer("apply_chat_template", tok)resolves toTitoRendererTitoRenderer.build_generation_prompt(...)returns aModelInputmeta-llama/Llama-3.1-8B-Instructcompletes for bothrenderer_name=llama3andrenderer_name=apply_chat_template, with both arms converging to 100% reward by step 2