Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task to finish build due inactivity has an edge case #4386

Open
humitos opened this issue Jul 17, 2018 · 10 comments
Open

Task to finish build due inactivity has an edge case #4386

humitos opened this issue Jul 17, 2018 · 10 comments
Assignees
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required Priority: low Low priority

Comments

@humitos
Copy link
Member

humitos commented Jul 17, 2018

Yesterday while deploying the community site we stop the celery queues for a while and we accumulate many tasks on it. I triggered a couple of builds and they weren't executed by any worker in a period of 1080 seconds (900s + 20%). Because of this reason, these builds were taken by finish_inactive_builds periodic task:

https://github.com/rtfd/readthedocs.org/blob/1f8351b593394efc9f20533776f4e48a018bbb56/readthedocs/projects/tasks.py#L1195-L1206

This task, marked them as FAILED and added the error message to it. After a while I saw my build like this:

captura de pantalla_2018-07-17_12-46-08

Then, when the workers recovered and started executing pending builds from the queue, these builds were performed and now the same build id shows a completely different message: COMPLETED

captura de pantalla_2018-07-17_12-58-19

One possible way to solve this is to save the task_id into the Build object and before marking the build as FAILED we could check through celery if that task_id is still alive or pending for a worker to take it. As far as I know, at this point we have no way to link a Build object with a Celery task.

@humitos humitos added the Improvement Minor improvement to code label Jul 17, 2018
@stale
Copy link

stale bot commented Jan 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Jan 10, 2019
@stsewd stsewd added Needed: replication Bug replication is required and removed Status: stale Issue will be considered inactive soon labels Jan 10, 2019
@stsewd
Copy link
Member

stsewd commented Jan 10, 2019

Is there a way to replicate this locally?

@humitos
Copy link
Member Author

humitos commented Jan 10, 2019

@stsewd yes,

  1. you have to start your Django instance with CELERY_ALWAYS_EAGER=False
  2. do not run the celery process
  3. trigger a build
  4. wait for DOCKER_TIME_LIMIT * 1.2

and you will hit the edge case.

@humitos humitos removed the Needed: replication Bug replication is required label Jan 10, 2019
@stale
Copy link

stale bot commented Feb 24, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Feb 24, 2019
@stsewd
Copy link
Member

stsewd commented Feb 25, 2019

Still valid, but very low priority

@stale stale bot removed the Status: stale Issue will be considered inactive soon label Feb 25, 2019
@stsewd stsewd added the Priority: low Low priority label Feb 25, 2019
@stale
Copy link

stale bot commented Apr 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Apr 11, 2019
@dojutsu-user
Copy link
Member

Still valid bot.

@stale stale bot removed the Status: stale Issue will be considered inactive soon label Apr 11, 2019
@stale
Copy link

stale bot commented May 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label May 26, 2019
@dojutsu-user
Copy link
Member

Still valid.

@stale stale bot removed the Status: stale Issue will be considered inactive soon label May 26, 2019
@humitos humitos added the Needed: design decision A core team decision is required label May 28, 2019
@humitos
Copy link
Member Author

humitos commented May 27, 2021

This function should check if we have a celery task queued and mark the build as finished if not. Otherwise, we may be set as finished builds that are being re-tried because of concurrency. Also, using the queue we will be 100% accurate in our decision and it does not require to be based on "time since the build was triggered" or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required Priority: low Low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants