Skip to content

Add some sensible retry for NN uploads #91

@thomas-riccardi

Description

@thomas-riccardi

We previously explicitly choose to not do retry on error for NN uploads. Errors do happen and it's a shame to lose a full training for a temporary upload error.

However, we should have specific settings for that large upload:

  • longer global timeout
  • don't retry 10 times, maybe 3 is enough
  • less retry, so more wait between retries (maybe 10s after first failure, then 30s?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    difficulty/easyEasy issue (less than one day)kind/bugBugkind/stabilityImpact the reliability/stability of the codepriority/mediumIssue to solve but not immediately

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions