refactor(v1): replace custom template system with apply_chat_template by frozenleaves · Pull Request #10598 · hiyouga/LlamaFactory

frozenleaves · 2026-06-22T09:23:08Z

Remove the hand-written v1 template/rendering stack (core/utils/rendering.py, plugins/model_plugins/rendering.py and the qwen3/qwen3_nothink template modules) and introduce a new core/rendering package that drives tokenization through the model's own apply_chat_template:

rendering.py -- Renderer orchestration + render_messages + process_samples
format.py -- v1<->HF message conversion + token subsequence search
label.py -- assistant-region labeling + structural verification
escape.py -- special-token escaping (prompt-injection hardening)
markers.py -- per-model assistant role markers (explicit whitelist)

ModelArguments gains custom_chat_template (replacing the removed template field); ModelEngine syncs/injects the chat template via _sync_chat_template. Import paths for the moved Renderer are updated across base_sampler, base_trainer, model_engine, inference_engine, cli_sampler and core/utils/batching. Example configs and tests drop the obsolete template: key.

Also drop the unused regex-based parse_message: the CLI sampler now appends the assistant turn as raw text to the conversation history.

gemini-code-assist

Code Review

This pull request refactors the rendering system in LlamaFactory, replacing the previous custom rendering plugins with a unified, secure Renderer class that handles message formatting, special-token escaping, and assistant-region labeling. Feedback on the changes highlights several critical improvements for robustness and correctness. Key recommendations include explicitly generating and resetting position_ids for pairwise/DPO sequences to prevent positional embedding offsets, adding defensive checks and try-except blocks to handle malformed JSON in tool calls and tool definitions, and implementing safety checks to avoid potential AttributeError crashes when accessing tokenizer attributes or when get_tokenizer returns None.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-22T09:26:24Z

+    def _sync_chat_template(self) -> None:
+        """Sync chat_template and inject custom_chat_template."""
+        tokenizer = get_tokenizer(self.processor)
+        if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None):
+            if getattr(tokenizer, "chat_template", None):
+                self.processor.chat_template = tokenizer.chat_template
+
+        if self.args.custom_chat_template:
+            if not is_tokenizer(self.processor):
+                self.processor.chat_template = self.args.custom_chat_template
+            else:
+                tokenizer.chat_template = self.args.custom_chat_template


If get_tokenizer(self.processor) returns None (e.g., if the processor does not have an associated tokenizer), accessing tokenizer.chat_template will raise an AttributeError. Adding a safety check ensures robust execution.

Suggested change

def _sync_chat_template(self) -> None:

"""Sync chat_template and inject custom_chat_template."""

tokenizer = get_tokenizer(self.processor)

if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None):

if getattr(tokenizer, "chat_template", None):

self.processor.chat_template = tokenizer.chat_template

if self.args.custom_chat_template:

if not is_tokenizer(self.processor):

self.processor.chat_template = self.args.custom_chat_template

else:

tokenizer.chat_template = self.args.custom_chat_template

def _sync_chat_template(self) -> None:

"""Sync chat_template and inject custom_chat_template."""

tokenizer = get_tokenizer(self.processor)

if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None):

if tokenizer is not None and getattr(tokenizer, "chat_template", None):

self.processor.chat_template = tokenizer.chat_template

if self.args.custom_chat_template:

if not is_tokenizer(self.processor):

self.processor.chat_template = self.args.custom_chat_template

elif tokenizer is not None:

tokenizer.chat_template = self.args.custom_chat_template

Remove the hand-written v1 template/rendering stack (core/utils/rendering.py, plugins/model_plugins/rendering.py and the qwen3/qwen3_nothink template modules) and drive tokenization through the model's own apply_chat_template: - rendering.py -- Renderer orchestration + render_messages + process_samples - format.py -- v1<->HF message conversion - escape.py -- special-token escaping (prompt-injection hardening) Assistant supervision is located WITHOUT a per-model marker table. A training sample is rendered so its last message is the supervised assistant turn, and that turn's token span is recovered by a single prompt/full diff: encode the prompt (everything up to and including the assistant role header via add_generation_prompt=True) and the full sequence; the tail the prompt does not cover is exactly this turn. process_samples splits multi-turn conversations into one sample per supervised turn so the diff always sits on the last-turn boundary, which is prefix-stable across chat templates -- appending the final assistant turn never restrips earlier turns -- so reasoning-history stripping (e.g. Qwen3 <think>) is handled correctly. A supervised turn carrying reasoning is rendered with thinking enabled (otherwise the generation prompt's empty <think></think> would break the prefix), and a fail-loud check rejects any template that is not prefix-stable for the turn. input_ids are the model's own canonical encoding. ModelArguments gains `custom_chat_template` (replacing the removed `template` field); ModelEngine syncs/injects the chat template via `_sync_chat_template`. Import paths for the moved Renderer are updated across base_sampler, base_trainer, model_engine, inference_engine, cli_sampler and core/utils/batching. Example configs and tests drop the obsolete `template:` key. The unused regex-based parse_message is dropped; the CLI sampler appends the assistant turn as raw text. This is the text-only refactor; multimodal data support builds on top of it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…all) - escape: guard against a tool_call whose JSON is a non-dict (list/str/int) -- previously tc.get(...) raised an uncaught AttributeError; now passed through. - rendering: reset position_ids per sequence for pairwise/DPO samples (chosen and rejected each restart at 1). A single continuous range offset the rejected sequence's positional embeddings. - format: validate tool_call JSON in _to_hf_messages -- malformed JSON or missing name/arguments now raise a descriptive ValueError instead of a raw exception. Tests: DPO position_ids reset, tool_call fail-loud validation, escape non-dict tool_call passthrough. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Prefix-expand multi-turn SFT conversations in DataEngine instead of in the renderer's collate_fn, so len(dataset) reflects the true post-split sample count (fixes num_training_steps / LR-schedule undercounting). For `u1 a1 u2 a2 u3 a3` the dataset yields `u1a1`, `u1a1u2a2`, `u1a1u2a2u3a3`, each trained on its last assistant turn. - data_engine.py: data_index entries become (name, idx, cut); _build_data_index converts each row once and emits one entry per supervised assistant turn (_prefix_cuts); __getitem__ truncates messages[:cut]. DPO / streaming / no supervised turn -> kept whole (cut=None). - rendering.py: drop Renderer._split_supervised_turns; process_samples renders each messages sample once (the diff-renderer already supervises only the last turn). DPO branch unchanged. - tests: replace the render-side split test with a render-once test + a DataEngine._prefix_cuts unit test. Verified end-to-end: text multi-turn SFT (Qwen3-0.6B) trains and saves; len(DataEngine) over a 120-conversation multi-turn demo == total supervised turns. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Prefix-split (data-layer multi-turn split) puts the supervised tokens at the tail, while truncation keeps [:cutoff_len]. A sample longer than cutoff_len thus loses all supervision and would contribute a zero-loss, zero-gradient (wasted) step. BatchGenerator now drops such samples when filling the buffer (_drop_unsupervised), which preserves the configured batch size (vs. dropping inside pad_and_truncate, which would shrink micro-batches or empty them). A one-time warning suggests raising cutoff_len. Applied to both the NORMAL path (_fill_buffer) and the dynamic/packing path (_next_samples). Verified end-to-end: with cutoff_len=512 on a long multi-turn dataset, the warning fires once and every optimizer step has non-zero loss (no truncation-induced zero-loss steps). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

frozenleaves temporarily deployed to docker June 22, 2026 09:23 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

frozenleaves force-pushed the v1-rendering-apply-chat-template branch from 7d6374a to 7315f31 Compare June 26, 2026 02:23

frozenleaves temporarily deployed to docker June 26, 2026 02:23 — with GitHub Actions Inactive

frozenleaves temporarily deployed to docker June 26, 2026 06:47 — with GitHub Actions Inactive

frozenleaves temporarily deployed to docker June 29, 2026 03:23 — with GitHub Actions Inactive

frozenleaves temporarily deployed to docker June 29, 2026 06:06 — with GitHub Actions Inactive

frozenleaves force-pushed the v1-rendering-apply-chat-template branch from a1f3f85 to fc44f9e Compare June 29, 2026 09:00

frozenleaves temporarily deployed to docker June 29, 2026 09:00 — with GitHub Actions Inactive

frozenleaves and others added 2 commits July 3, 2026 14:31

Update src/llamafactory/v1/core/rendering/rendering.py

6ff6c1b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'main' into v1-rendering-apply-chat-template

e8968bb

frozenleaves temporarily deployed to docker July 3, 2026 06:36 — with GitHub Actions Inactive

frozenleaves deployed to docker July 3, 2026 06:36 — with GitHub Actions View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(v1): replace custom template system with apply_chat_template#10598

refactor(v1): replace custom template system with apply_chat_template#10598
frozenleaves wants to merge 6 commits into
hiyouga:mainfrom
frozenleaves:v1-rendering-apply-chat-template

frozenleaves commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frozenleaves commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant