Skip to content

Conversation

@ritazh
Copy link
Member

@ritazh ritazh commented Jul 21, 2025

Fixes: #97

  • Add support for the DRAAdminAccess feature
  • Add resource.k8s.io/v1beta2=true to kind config
  • Update cdi handler to inject host hardware information

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ritazh
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from elezar and klueska July 21, 2025 20:20
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 21, 2025
@ritazh ritazh force-pushed the DRAAdminAccess-example branch from f6cbf90 to b399cba Compare July 21, 2025 21:28
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jul 24, 2025
info := &HostHardwareInfo{}

// Get CPU information
if cpuInfo, err := readFile("/proc/cpuinfo"); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the rest of the data coming out of the driver is entirely OS- and hardware-agnostic AFAIK, could we follow that here? Even if admin access is implemented here as simply as adding only DRA_ADMIN_ACCESS=true, I think that's enough to demonstrate the end-to-end flow. And it doesn't assume any details about the underlying OS or hardware, which I think is an important part of making the example driver usable in as many places as possible.

e.g. @lauralorenz just brought up the question of whether the example driver works on Windows. Without this change the answer is "maybe?" but reading /proc would make that a more conclusive "no." I know the driver wouldn't fail and we'd only get the "unavailable" value in that case, but I'd rather avoid any evidence of assumptions like that if possible.

func (d *driver) PrepareResourceClaims(ctx context.Context, claims []*resourceapi.ResourceClaim) (map[types.UID]kubeletplugin.PrepareResult, error) {
klog.Infof("PrepareResourceClaims is called: number of claims: %d", len(claims))
for i, claim := range claims {
klog.Infof("Claim %d: UID=%s, Namespace=%s, Name=%s", i, claim.UID, claim.Namespace, claim.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this log to prepareResourceClaim so we don't need this extra loop?

checkpoint := newCheckpoint()
if err := s.checkpointManager.GetCheckpoint(DriverPluginCheckpointFile, checkpoint); err != nil {
return nil, fmt.Errorf("unable to sync from checkpoint: %v", err)
// Create a new checkpoint if the old one is corrupted or in a different format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary here because AdminAccess is now part of the checkpoint?

return nil, fmt.Errorf("unable to sync from checkpoint: %v", err)
// Create a new checkpoint if the old one is corrupted or in a different format
checkpoint = newCheckpoint()
if err := s.checkpointManager.CreateCheckpoint(DriverPluginCheckpointFile, checkpoint); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @bg-chun this might conflict with #90.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we integrate this with the e2e job?
https://github.com/kubernetes-sigs/dra-example-driver/blob/b399cbaa49b9d52d8266cf604e5f3f873bdd9fdd/.github/workflows/e2e.yaml

That will help make sure once we update for 1.34 that everything here is working properly.

metadata:
name: gpu-test7
labels:
resource.k8s.io/admin-access: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest that new users set both of these labels in anticipation of upgrading to 1.34? If so, I think we should do that here too.

echo ""
echo "=== Container will sleep for 5 minutes to allow inspection ==="
trap 'exit 0' TERM
sleep 300 & wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these only sleep for 5 minutes? The other examples sleep for 9999 seconds which I think I slightly prefer so I don't feel like I have to race against the clock and look at this right away.

@nojnhuh nojnhuh mentioned this pull request Aug 4, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for DRAAdminAccess

3 participants