SFT: run evaluators on post-final-optim weights before final checkpoint#736
Open
Butanium wants to merge 1 commit into
Open
SFT: run evaluators on post-final-optim weights before final checkpoint#736Butanium wants to merge 1 commit into
Butanium wants to merge 1 commit into
Conversation
a035833 to
62979f6
Compare
Butanium
commented
May 26, 2026
Butanium
left a comment
Author
There was a problem hiding this comment.
pretty useful if you want to ensure the final checkpoint is evaled
The in-loop eval cadence in the supervised training loop stops at step
total_steps-1 (loop `range` upper bound). For asymmetric eval cadences
where the user wants an eval at the very end of training (e.g. eval
fractions {1/6, 4/6, 1.0} of total steps), the 1.0 case would silently
never fire — evaluators would never see the genuinely-final weights that
get saved in the final checkpoint.
Add an `evals_final` block inside `if did_train` that re-runs the
evaluators once at `total_steps` before saving the final checkpoint,
matching the checkpoint's label semantics.
62979f6 to
7490b8e
Compare
Author
|
check failed because of hf api error in some unrelated test, reopening to retrigger test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tinker_cookbook/supervised/train.py, add anevals_finalblock inside theif did_train:branch that re-runsevaluatorson the post-final-optim weights atstep = total_steps, just beforecheckpoint_mgr.save_final_async(...).evaluatorsis non-empty andconfig.eval_every > 0(i.e. when eval is enabled at all), so default behavior for runs without evaluators is unchanged.Motivation
The in-loop eval check in the SFT loop is
and runs before that iteration's
forward_backward_async/optim_step_async, so it snapshots pre-step weights. Two consequences:step = floor((total_steps - 1) / eval_every) * eval_every, on weights that have undergone that many optim steps.checkpoint_mgr.save_final_asyncimmediately after the loop, captures weights that have undergonetotal_stepsoptim steps.So even in the vanilla case (
eval_every=10,total_steps=100), the last in-loop eval fires atstep=90and reports metrics for weights ~10 optim steps stale relative to the final checkpoint. In the worst case (whentotal_stepsis one short of a multiple ofeval_every), the in-loop eval can beeval_every - 1steps stale. For users who treat the final-checkpoint metrics as the model's "final" evaluation, this is a silent mismatch between what got logged and what got saved.This PR closes that gap by running the configured evaluators once more at
step = total_steps, immediately before the final checkpoint save, matching the checkpoint's step label.Notes / alternatives considered
final_eval: bool = Trueconfig flag. Happy to add one if reviewers prefer — the current design treats "if you have evals configured, you almost certainly want one at the genuinely-final weights" as the obvious default.🤖 Drafted with Claude Code — Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com