-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-5710]: Workload-aware preemption KEP #5711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
672aa68 to
ce04eca
Compare
ce04eca to
0ff3958
Compare
| 1. Identify the list of potential victims: | ||
| - all running workloads with (preemption) priority lower than the new workload W | ||
| - all individual pods (not being part of workloads) with priority lower than the new workload W |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two independent priorities for a workload - one for scheduling and one for the preemption or the single preemption priority which can be dynamically updated can potentially lead to a cycle in the preemption.
Let's assume that we have an existing workload A with high scheduling priority and low preemption priority running in a cluster.
Now let's assume that we want to schedule a workload B which has medium scheduling priority and medium preemption priority.
Workload B will preempt workload A and will start to run because its scheduling priority > preemption priority of the workload A.
However when workload A will restart and it will be rescheduled it will preempt workload B and will start to run because its scheduling priority > preemption priority of workload B.
The same issue can happen if we will have only one priority but this priority will be reduced while the workload is running. After preemption when the workload will reappear with the original higher priority it can preempt the workload which has preempted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential solution / mitigation to the described problem could be stating that preemption priority >= scheduling priority. This way after restarting the preempted workload will not be able to preempt the preemptor workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for point that out!
Yeah - "preemption priority >= scheduling priority" is definitely desired. I don't think we have any usecases that would benefit from the reversed.
That said, I need to think a bit more if that is enough. I think it prevents the cycles if we assume static priorities, but it can still potentially trigger cycles if the priorities will be changing. OTOH, if the priorities are changing this is probably desired.
Let me think about it a bit more and I will update the KEP to reflect the thoughts later this week.
|
/assign |
One-line PR description: First draft of Workload-aware preemption KEP
Issue link: Workload-aware preemption #5710