Skip to content

Conversation

CloseChoice
Copy link
Contributor

@CloseChoice CloseChoice commented Oct 17, 2025

closes #7567

Add shift_rngs method to ExamplesIterable that is called directly after sharding. If a generator is available (not the case for all subclasses) we update the seed of the generator by shifting by the worker_id.

This is just the fix for shuffle, in the corresponding issue interleave_datasets is mentioned as well, which won't be fixed with this approach.

EDIT: This is a fix for shuffle and interleave_datasets. Adding recursivity to shift_rngs solved interleave_datasets as well. Not sure though if this is completely safe or if we could destroy something with that. I don't think so but could be wrong and appreciate some guidance from the maintainers. I also checked, on a single_worker we are always handing over index=0 so that case preserves the seed the user specified.

@CloseChoice CloseChoice changed the title Fix shuffle seed 7567 Fix random seed when shuffling Oct 17, 2025
@CloseChoice CloseChoice changed the title Fix random seed when shuffling Fix random seed when shuffling and interleaving_datasets Oct 17, 2025
@CloseChoice CloseChoice changed the title Fix random seed when shuffling and interleaving_datasets Fix random seed on shuffle and interleave_datasets Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

interleave_datasets seed with multiple workers

1 participant