Multiple semaphores render scheduler unresponsive and crash entire cluster #8999
Labels
bug
Something is broken
good expert issue
Clearly described but requires someone extremely familiar with the project to implement successfully
help wanted
Describe the issue:
When multiple semaphores are passed as argument to the tasks, the CPU utilization of scheduler process may increase without limit due to unknown "leakage" even if none of the semaphores are used inside the tasks. The scheduler becomes unresponsive eventually and the entire cluster failed as a consequence.
Minimal Complete Verifiable Example:
Anything else we need to know?:
In my real-world example, I'm destined to hit the following error:
or
And, py-spy flamegraph may show that it takes longer and longer for the scheduler process to run
create_task
andadd_future
:Found some possibly related issues or discussions around but I'm not sure if they use semaphore / lock.
Environment:
The text was updated successfully, but these errors were encountered: