Open
Description
What I’m trying to do
Resume pre-training in Megatron-SWIFT with streaming=true
and packing=true
.
Observation
- On resume, the checkpoint correctly restores iteration and consumed_samples.
- It is unclear whether the streaming dataset picks up from the same offset or restarts from the beginning (which would duplicate samples).
My reading of the code
IterablePackingDataset
doesn’t appear to persist its internal index, so I suspect the offset isn’t saved on the SWIFT side.
Question
- Should already-consumed samples be automatically skipped when resuming with
streaming=true
? - Does Megatron-LM handle this internally, or should users manually skip consumed_samples when resuming?
Thanks in advance for any clarification!