-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
difficulty/easyEasy issue (less than one day)Easy issue (less than one day)kind/bugBugBugkind/stabilityImpact the reliability/stability of the codeImpact the reliability/stability of the codepriority/mediumIssue to solve but not immediatelyIssue to solve but not immediately
Description
We previously explicitly choose to not do retry on error for NN uploads. Errors do happen and it's a shame to lose a full training for a temporary upload error.
However, we should have specific settings for that large upload:
- longer global timeout
- don't retry 10 times, maybe 3 is enough
- less retry, so more wait between retries (maybe 10s after first failure, then 30s?)
Metadata
Metadata
Assignees
Labels
difficulty/easyEasy issue (less than one day)Easy issue (less than one day)kind/bugBugBugkind/stabilityImpact the reliability/stability of the codeImpact the reliability/stability of the codepriority/mediumIssue to solve but not immediatelyIssue to solve but not immediately