refactor(v1): replace custom template system with apply_chat_template#10598
refactor(v1): replace custom template system with apply_chat_template#10598frozenleaves wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the rendering system in LlamaFactory, replacing the previous custom rendering plugins with a unified, secure Renderer class that handles message formatting, special-token escaping, and assistant-region labeling. Feedback on the changes highlights several critical improvements for robustness and correctness. Key recommendations include explicitly generating and resetting position_ids for pairwise/DPO sequences to prevent positional embedding offsets, adding defensive checks and try-except blocks to handle malformed JSON in tool calls and tool definitions, and implementing safety checks to avoid potential AttributeError crashes when accessing tokenizer attributes or when get_tokenizer returns None.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _sync_chat_template(self) -> None: | ||
| """Sync chat_template and inject custom_chat_template.""" | ||
| tokenizer = get_tokenizer(self.processor) | ||
| if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None): | ||
| if getattr(tokenizer, "chat_template", None): | ||
| self.processor.chat_template = tokenizer.chat_template | ||
|
|
||
| if self.args.custom_chat_template: | ||
| if not is_tokenizer(self.processor): | ||
| self.processor.chat_template = self.args.custom_chat_template | ||
| else: | ||
| tokenizer.chat_template = self.args.custom_chat_template |
There was a problem hiding this comment.
If get_tokenizer(self.processor) returns None (e.g., if the processor does not have an associated tokenizer), accessing tokenizer.chat_template will raise an AttributeError. Adding a safety check ensures robust execution.
| def _sync_chat_template(self) -> None: | |
| """Sync chat_template and inject custom_chat_template.""" | |
| tokenizer = get_tokenizer(self.processor) | |
| if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None): | |
| if getattr(tokenizer, "chat_template", None): | |
| self.processor.chat_template = tokenizer.chat_template | |
| if self.args.custom_chat_template: | |
| if not is_tokenizer(self.processor): | |
| self.processor.chat_template = self.args.custom_chat_template | |
| else: | |
| tokenizer.chat_template = self.args.custom_chat_template | |
| def _sync_chat_template(self) -> None: | |
| """Sync chat_template and inject custom_chat_template.""" | |
| tokenizer = get_tokenizer(self.processor) | |
| if not is_tokenizer(self.processor) and not getattr(self.processor, "chat_template", None): | |
| if tokenizer is not None and getattr(tokenizer, "chat_template", None): | |
| self.processor.chat_template = tokenizer.chat_template | |
| if self.args.custom_chat_template: | |
| if not is_tokenizer(self.processor): | |
| self.processor.chat_template = self.args.custom_chat_template | |
| elif tokenizer is not None: | |
| tokenizer.chat_template = self.args.custom_chat_template |
Remove the hand-written v1 template/rendering stack (core/utils/rendering.py, plugins/model_plugins/rendering.py and the qwen3/qwen3_nothink template modules) and drive tokenization through the model's own apply_chat_template: - rendering.py -- Renderer orchestration + render_messages + process_samples - format.py -- v1<->HF message conversion - escape.py -- special-token escaping (prompt-injection hardening) Assistant supervision is located WITHOUT a per-model marker table. A training sample is rendered so its last message is the supervised assistant turn, and that turn's token span is recovered by a single prompt/full diff: encode the prompt (everything up to and including the assistant role header via add_generation_prompt=True) and the full sequence; the tail the prompt does not cover is exactly this turn. process_samples splits multi-turn conversations into one sample per supervised turn so the diff always sits on the last-turn boundary, which is prefix-stable across chat templates -- appending the final assistant turn never restrips earlier turns -- so reasoning-history stripping (e.g. Qwen3 <think>) is handled correctly. A supervised turn carrying reasoning is rendered with thinking enabled (otherwise the generation prompt's empty <think></think> would break the prefix), and a fail-loud check rejects any template that is not prefix-stable for the turn. input_ids are the model's own canonical encoding. ModelArguments gains `custom_chat_template` (replacing the removed `template` field); ModelEngine syncs/injects the chat template via `_sync_chat_template`. Import paths for the moved Renderer are updated across base_sampler, base_trainer, model_engine, inference_engine, cli_sampler and core/utils/batching. Example configs and tests drop the obsolete `template:` key. The unused regex-based parse_message is dropped; the CLI sampler appends the assistant turn as raw text. This is the text-only refactor; multimodal data support builds on top of it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7d6374a to
7315f31
Compare
…all) - escape: guard against a tool_call whose JSON is a non-dict (list/str/int) -- previously tc.get(...) raised an uncaught AttributeError; now passed through. - rendering: reset position_ids per sequence for pairwise/DPO samples (chosen and rejected each restart at 1). A single continuous range offset the rejected sequence's positional embeddings. - format: validate tool_call JSON in _to_hf_messages -- malformed JSON or missing name/arguments now raise a descriptive ValueError instead of a raw exception. Tests: DPO position_ids reset, tool_call fail-loud validation, escape non-dict tool_call passthrough. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prefix-expand multi-turn SFT conversations in DataEngine instead of in the
renderer's collate_fn, so len(dataset) reflects the true post-split sample count
(fixes num_training_steps / LR-schedule undercounting). For `u1 a1 u2 a2 u3 a3`
the dataset yields `u1a1`, `u1a1u2a2`, `u1a1u2a2u3a3`, each trained on its last
assistant turn.
- data_engine.py: data_index entries become (name, idx, cut); _build_data_index
converts each row once and emits one entry per supervised assistant turn
(_prefix_cuts); __getitem__ truncates messages[:cut]. DPO / streaming / no
supervised turn -> kept whole (cut=None).
- rendering.py: drop Renderer._split_supervised_turns; process_samples renders
each messages sample once (the diff-renderer already supervises only the last
turn). DPO branch unchanged.
- tests: replace the render-side split test with a render-once test + a
DataEngine._prefix_cuts unit test.
Verified end-to-end: text multi-turn SFT (Qwen3-0.6B) trains and saves;
len(DataEngine) over a 120-conversation multi-turn demo == total supervised turns.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prefix-split (data-layer multi-turn split) puts the supervised tokens at the tail, while truncation keeps [:cutoff_len]. A sample longer than cutoff_len thus loses all supervision and would contribute a zero-loss, zero-gradient (wasted) step. BatchGenerator now drops such samples when filling the buffer (_drop_unsupervised), which preserves the configured batch size (vs. dropping inside pad_and_truncate, which would shrink micro-batches or empty them). A one-time warning suggests raising cutoff_len. Applied to both the NORMAL path (_fill_buffer) and the dynamic/packing path (_next_samples). Verified end-to-end: with cutoff_len=512 on a long multi-turn dataset, the warning fires once and every optimizer step has non-zero loss (no truncation-induced zero-loss steps). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a1f3f85 to
fc44f9e
Compare
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Remove the hand-written v1 template/rendering stack (core/utils/rendering.py, plugins/model_plugins/rendering.py and the qwen3/qwen3_nothink template modules) and introduce a new core/rendering package that drives tokenization through the model's own apply_chat_template:
ModelArguments gains
custom_chat_template(replacing the removedtemplatefield); ModelEngine syncs/injects the chat template via_sync_chat_template. Import paths for the moved Renderer are updated across base_sampler, base_trainer, model_engine, inference_engine, cli_sampler and core/utils/batching. Example configs and tests drop the obsoletetemplate:key.Also drop the unused regex-based
parse_message: the CLI sampler now appends the assistant turn as raw text to the conversation history.