Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: improve concurrency limit retry message #9011

Open
humitos opened this issue Mar 14, 2022 · 5 comments
Open

Build: improve concurrency limit retry message #9011

humitos opened this issue Mar 14, 2022 · 5 comments
Assignees
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code

Comments

@humitos
Copy link
Member

humitos commented Mar 14, 2022

When a build is concurrency limited, we show the maximum number of concurrent builds. Besides that limit, it would be good to show the number of times the build has retried already and the number of remaining retries.

After that, if the build reaches the maximum retried times, we should update the message accordingly to communicate the build has reached this number.

  • "Concurrency limit reached (1), retrying in 5 minutes. Retries (7/25)"
  • "Concurrency limit reached (1), retrying in 5 minutes. Retries (25/25)." or "This build reached the max retries allowed and won't be executed."

We need a message like that. Otherwise, users believe the build is going to be retried again when it's not gonna be retried because the retry limit was hit (note that having a high number of retries could interfere with #8269 and it could be killing the builds)

Places in the code:

Note there is a MaxRetriesExceededError exception that we can handle to update the build's error message when we reached the limit: https://docs.celeryproject.org/en/stable/userguide/tasks.html#retrying

Front logo Front conversations

@humitos humitos added Improvement Minor improvement to code Accepted Accepted issue on our roadmap labels Mar 14, 2022
@humitos humitos self-assigned this Mar 14, 2022
@humitos humitos moved this to Planned in 📍Roadmap Mar 29, 2022
@agjohnson
Copy link
Contributor

agjohnson commented Apr 4, 2022

Are the logs indicating we are retrying builds over and over until failing the build? That is, are these legitimate concurrency retries?

Some of the reports we've collected on this bug make it seem like the error is in our retry/concurrency logic, and the issue doesn't seem related to how we are communicating build retries.

For example, this build and the prior builds for this project all seem suspect https://readthedocs.org/projects/overte-docs/builds/16548600/

image

The last passing build took 94 seconds and was several days before the first failed concurrency limited builds. This project does have translations that would cause concurrency limit conflict, but builds never being retried only seems to be an issue after last release.

@humitos
Copy link
Member Author

humitos commented Apr 6, 2022

Some of the reports we've collected on this bug

I'm not talking about a bug on concurrency limits here, just making the message shown to the user better.

@agjohnson
Copy link
Contributor

I understand, but I think focusing on the error message is focusing on a symptom of the error instead of the error itself. Users are more confused about the builds never retrying than they are how many times the build has been retried. In the case above, the error message wouldn't help the user, the builds still ultimately never retry.

If we address the underlying issue of builds not retrying, the user likely won't see the count on the error message anyways. I've asked users about retries and they don't notice the retry mechanism, they just wait for the build to finish.

@humitos
Copy link
Member Author

humitos commented Apr 7, 2022

Yeah, I know. I'm not focusing on the error message instead of the error. They are just two different problems. One is the retry not working and a different issue (labeled as "Improvement") is the error message. It could happen the user has reached the concurrency limit during 25 * 5m, and that particular build won't retry again, communicating a wrong message. That case is this issue about.

@stsewd
Copy link
Member

stsewd commented Apr 27, 2022

Note there is a MaxRetriesExceededError exception that we can handle to update the build's error message when we reached the limit: https://docs.celeryproject.org/en/stable/userguide/tasks.html#retrying

Just noting from #9148 that the actual exception would be BuildMaxConcurrencyError as we are using self.retry(exc=BuildMaxConcurrencyError), it will also be BuildMaxConcurrencyError if we raise the exception and rely on autoretry_for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Projects
None yet
Development

No branches or pull requests

3 participants