Skip to content

Conversation

tushar00jain
Copy link
Contributor

@tushar00jain tushar00jain commented Aug 28, 2025

Summary:

  • add a script to lauch replicas using titan on slurm
  • add a script to randomly kill replicas to test fault tolerance

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2025
@tushar00jain tushar00jain requested a review from d4l3k August 28, 2025 21:27
@tushar00jain tushar00jain force-pushed the pr263 branch 7 times, most recently from 88b9ba7 to fd19fee Compare August 29, 2025 19:05
Summary:
- add a script to lauch replicas using titan on slurm
- add a script to randomly kill replicas to test fault tolerance
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d4l3k d4l3k merged commit eebdf3a into pytorch:main Aug 29, 2025
13 checks passed
@tushar00jain tushar00jain deleted the pr263 branch August 30, 2025 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants