Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlocks by making CurrentThreadExecutor reentrant and reusing it #493

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Feb 5, 2025

When there are multiple CurrentThreadExecutors on the same stack, it seems there’s no good way to decide in advance which one new jobs should be routed to. If we send a job to an executor too low in the stack, something in a higher executor’s queue might deadlock waiting for it to finish (#348?). If we send a job to an executor too high in the stack, that job might deadlock waiting for something in a lower executor’s queue to finish (#492), or the executor might have already terminated and been marked broken (#491).

So instead, make a single CurrentThreadExecutor for the loop, and reuse it on the same stack as many times as necessary, thereby allowing any pending job to run no matter where in the stack we currently are.

It is still possible for a job higher in the stack to deadlock waiting for a job that’s currently executing lower in the stack, but in that case, there’s truly no solution as far as I can see (the user needs to migrate more of their sync code to async, I guess).

@carltongibson
Copy link
Member

carltongibson commented Feb 5, 2025

Hi @andersk — thanks for this. The other maybe related issue I had on my radar was #475.

It's been on my list to go back to pre-3.8.0 and refollow the logic behind each change, but haven't got to any of that 🤹. If we can pin down what's actually happening here, and resolve the issues it would be amazing 🤩

#367 is on the list too. (That was the first one I think.)

@andersk
Copy link
Contributor Author

andersk commented Feb 5, 2025

There’s no question those are related, in that #475 was supposedly fixed by #478 which caused #491 (as verified by running the test case in #491 before and after #478), and #367 caused #492 (as verified similarly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants