Skip to content

DRA API: implement ResourceClaim strategy for DRADeviceTaints #132927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jul 14, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Dropping the disabled "Tolerations" field in the ResourceClaim API was missing.

This wasn't possible at the time of implementing the Device Taints API, at least not completely, because it depended on prioritized list being merged first, to cover the "FirstAvailable" field introduced together with that feature.

That the device taints PR got merged despite this gap was an oversight. The confusing TODO probably didn't help: the entire implementation was missing (or got lost due to a bad merge conflict resolution, not sure anymore) and it referenced the wrong other feature (partitionable devices doesn't affect ResourceClaim).

Which issue(s) this PR is related to:

KEP: kubernetes/enhancements#5055

Special notes for your reviewer:

This allowed clients to set the field when it should have been dropped. It simply had no effect. Clients can keep updating such objects because of the "feature in use" check, as long as the spec remains immutable.

Does this PR introduce a user-facing change?

DRA API: the "tolerations" field in exact and sub requests now gets dropped properly when the DRADeviceTaints API is disabled.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 14, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 14, 2025
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and klueska July 14, 2025 13:12
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Jul 14, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -212,6 +212,7 @@ func toSelectableFields(claim *resource.ResourceClaim) fields.Set {
// dropDisabledFields removes fields which are covered by a feature gate.
func dropDisabledFields(newClaim, oldClaim *resource.ResourceClaim) {
dropDisabledDRAPrioritizedListFields(newClaim, oldClaim)
dropDisabledDRADeviceTaintsFields(newClaim, oldClaim) // Intentionally after dropDisabledDRAPrioritizedListFields to avoid iterating over FirstAvailable slice which needs to be dropped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need the same functionality in ResourceClaimTemplate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will add it there, too. Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, ResourceClaimTemplate update testing was less complete than the update testing of ResourceClaim. Fixed by copying the entire TestStrategyUpdate over and switching it to testing ResourceClaimTemplates.

I kept the existing test, to ensure that I am not removing coverage.

@@ -372,6 +419,26 @@ func TestStrategyCreate(t *testing.T) {
}
},
},
"drop-fields-device-taints": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have tests here that covers interaction with the PrioritizedList feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -587,6 +656,54 @@ func TestStrategyUpdate(t *testing.T) {
}
},
},
"drop-fields-device-taints-in-prioritized-list": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some tests that don't include PrioritizedList?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now I am questioning my own sanity. I could have sworn that I had added them.

…or DRADeviceTaints

This wasn't possible at the time of implementing the Device Taints API, at
least not completely, because it depended on prioritized list being merged
first, to cover the "FirstAvailable" field introduced together with that
feature.

That the device taints PR got merged despite this gap was an oversight. The
confusing TODO probably didn't help: the entire implementation was missing (or
got lost due to a bad merge conflict resolution, not sure anymore) and it
referenced the wrong other feature (partitionable devices doesn't affect
ResourceClaim).

For some reason, ResourceClaimTemplate update testing was less complete than
the update testing of ResourceClaim. Fixed by copying the entire
TestStrategyUpdate over and switching it to testing ResourceClaimTemplates.
@pohly pohly force-pushed the dra-api-strategy-todo branch from edeec84 to fd7c1c2 Compare July 15, 2025 09:14
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 15, 2025
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

3 participants