Retry list/download operation depending on the HTTP response #300

orbitz · 2024-12-18T09:25:02Z

Is your feature request related to a problem? Please describe.
We are hosting our own LIST and DOWNLOAD proxies and sometimes an operation might fail due to a temporary problem, for example a 502 or a connection failure in the case of a transient network hiccup. As far as I have seen, tenv does not support this, or at least it does not expose environment variables to configure the behaviour.

Describe the solution you'd like
For our own software we support a few retries with exponential backoff. We have found this works for us, depending on the failure case. I think the ability to do this and configure it via env variables would be useful.

Describe alternatives you've considered
We considered executing the install ourselves and retrying if it fails however the interface we provide to users is meant to not know they are using tenv underneath, they just call tofu, so it is not necessarily feasible to ensure an install is done prior to the execution because we do not know when a customer will issue the command.

Additional context

The text was updated successfully, but these errors were encountered:

kvendingoldo · 2024-12-18T10:17:29Z

retries with exponential backoff sound like a good idea!

dvaumoron · 2024-12-23T21:58:18Z

I am not sure that this feature should be embed in tenv code base, however you can use the tenvlib package to manage this kind of use case.

orbitz · 2024-12-26T07:51:00Z

Any argument as to why? I'm guessing 90%+ usage of tenv is via the tenv CLI. And currently that means there is no good retry mechanism for that usage.

dvaumoron · 2024-12-26T12:52:53Z

a good retry mecanism for a CLI usage ? I don't get your point, the user can lauch the command again in case of network/server temporary failure

why does it worth the cost of increasing maintenance effort ?

orbitz · 2024-12-27T08:57:32Z

It simplifies the interface for the user, generally these failures are ephemeral and just retrying will resolve it (depending on the failure).
If you use tenv indirectly via tofu or terraform where it fetches and installs as part of the tofu or terraform command, then the user has to distinguish between tenv failing and and tofu or terraform failing, so it's harder to determine if you even should retry, whereas tenv has the knowledge already so it can make that decision. tenv is already doing magic behind the scenes to make using tofu and terraform feel like a seamless experience, but part of doing network calls, I would argue, require including retries in them to maintain that seamless experience, so networks are fickle beasts.
I guess I don't know Go well enough to say, but I would imagine adding retry for these operations is just wrapping it in a retry function? Is there more to it than that?

dvaumoron · 2025-01-02T17:19:19Z

I disagree on that point, a user without response will wonder why his command hang (this is the reason behind the choice to display tenv output when there is an installation, we don't want the user to kill a "slow" process, instead of being a true proxy as it try to be in other case)
this point seem interesting, however you seem to forget that tofu (or terraform) are designed to be idempotent, so there should not be issue to re run a command. Moreover tenv use colorized output in proxy mode to allows to know where the error come from (proxied output stay unchanged), although I guess some improvement could be done there (better error message and option to change the color to avoid missing the difference when the default display color is the same).
it depends on what you call "wrapping it in a retry function", because "tenv has the knowledge already so it can make that decision" mean you want a logic, although what kind ? network failure ? server failure depending on http code ? Is it useless to launch again a signature check ? Not really with Cosign, some failure are linked to server stability, so there are decision to make to do that properly.

Again, I think that could become a lot of work for a small benefit

orbitz · 2025-01-02T19:30:23Z

Points 1 and 2 are perhaps easy for a human for interpret, but for automation it is harder. Did tofu init fail because of an ephemeral network error or because of a configuration problem?

For point 3, if you believe it is too much complexity for tenv, I'm in no position to disagree.

I think there is a lot of value here. I believe you are mostly considering the use case of humans directly interacting with tenv, however if you're using tenv in automation, you're using it hundreds of times per hour and those ephemeral failures add up. Additionally, tenv downloads multiple network operations, it's not a great experience to have to re-run the whole operation is just one of those fails (listing, downloading main binary, downloading signatures).

That being said, how does this ticket end? Is there further discussion to be had or is the decision made, in which case we can close the ticket?

dvaumoron · 2025-01-29T16:13:49Z

If the decision was made, the ticket would already been closed. However I am still trying to figure out why it could be interesting...

If you experience a lot of failure within your network, and want to avoid to relaunch a whole pipeline, you can script the retry mecanism there.

You said you want your users to have a better experience with their call, however depending on my answers you talk about CLI usage or automation. Your particular use case is unclear, and I still don't get why you need that feature to be done within tenv.

orbitz · 2025-01-29T19:40:00Z

Thank you, @dvaumoron. The reasons I believe this is a useful feature of tenv are as follows:

tenv provides a transparent experience to users where you can run tofu or terraform and it will install the appropriate version before executing the command.
When running anything that touches the network at scale will result in the occasional ephemeral failure.
When using tenv in an automated environment, the human user is not in a position to make the decision as to whether or not to run the operation again.

Given that tenv allows a transparent experience, when it comes to automation in the face of transient failures, I think there are two options:

The tool can take care of retries itself.
The tool can communicate up to the caller such that it knows if it is a scenario where a retry is viable.

I am advocating for (1) however I think (2) could be viable as well. I think for (2) to be viable, tenv needs to communicate to the automation, somehow, that the error was a transient error. As far as I know, tenv does not do this. There is some logging, but I don't think evaluating logging is a great solution, perhaps tenv could set a specific exit code that is not used by tofu or terraform that could be used to indicate the failure type?

dvaumoron · 2025-01-30T18:55:41Z

I will take a look to have a specific return code when a failure come from tenv side

kvendingoldo added the enhancement New feature or request label Dec 18, 2024

dvaumoron self-assigned this Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry list/download operation depending on the HTTP response #300

Retry list/download operation depending on the HTTP response #300

orbitz commented Dec 18, 2024

kvendingoldo commented Dec 18, 2024

dvaumoron commented Dec 23, 2024

orbitz commented Dec 26, 2024

dvaumoron commented Dec 26, 2024 •

edited

Loading

orbitz commented Dec 27, 2024

dvaumoron commented Jan 2, 2025

orbitz commented Jan 2, 2025

dvaumoron commented Jan 29, 2025

orbitz commented Jan 29, 2025

dvaumoron commented Jan 30, 2025

Retry list/download operation depending on the HTTP response #300

Retry list/download operation depending on the HTTP response #300

Comments

orbitz commented Dec 18, 2024

kvendingoldo commented Dec 18, 2024

dvaumoron commented Dec 23, 2024

orbitz commented Dec 26, 2024

dvaumoron commented Dec 26, 2024 • edited Loading

orbitz commented Dec 27, 2024

dvaumoron commented Jan 2, 2025

orbitz commented Jan 2, 2025

dvaumoron commented Jan 29, 2025

orbitz commented Jan 29, 2025

dvaumoron commented Jan 30, 2025

dvaumoron commented Dec 26, 2024 •

edited

Loading