Skip to content

Conversation

@dharmjit
Copy link

This PR adds a mechanism to pass device attributes to emulate the real devices in local test setups.

  • Add --device-attributes CLI flag and DEVICE_ATTRIBUTES env var
  • Implement automatic type detection for int, bool, version, string values
  • Add Helm template support for device attributes configuration

- Add --device-attributes CLI flag and DEVICE_ATTRIBUTES env var
- Implement automatic type detection for int, bool, version, string values
- Add Helm template support for device attributes configuration
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 29, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @dharmjit!

It looks like this is your first PR to kubernetes-sigs/dra-example-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/dra-example-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dharmjit
Once this PR has been reviewed and has the lgtm label, please assign elezar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2025
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Sep 30, 2025
},
&cli.StringFlag{
Name: "device-attributes",
Usage: "Additional device attributes to be added to resource slices in key=value format, separated by commas. Example: productName=NVIDIA GeForce RTX 5090,architecture=Blackwell",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could encode the type in here too to eliminate the need to guess? Maybe like key=type=value? Or encoding these as JSON or some other format that already knows how to distinguish types? Or maybe take inspiration from Helm's --set and --set-string? Telling the difference between a string and a version with the latter two options might still be tricky though unless we make the user specify that explicitly like in the first option.

Something like that would also allow setting true and false as strings instead of always being interpreted as booleans. And likewise allowing strings that look like numbers or versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dharmjit I'm still curious if addressing this issue is worthwhile? I wouldn't block the PR on this, but I think we should at least note the caveats for anyone else who might be looking to implement this in their own drivers.

One other potential issue I could see is that if new types are implemented upstream (like lists), then implementing those here will be challenging without affecting existing users.


kubeletPlugin:
numDevices: 8
deviceAttributes: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't obviously tied to the command line flag only from looking at this file, could we include the format and one example here? That might save others from digging around too much to see how to use this.

README.md Outdated
name: gpu-0
```
TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @klueska can clarify the original intent, but I envisioned this "Anatomy of a DRA resource driver" section to describe how drivers work in general, not the specific settings of this particular driver that might not apply to at least most other drivers.

If I'm interpreting that right, could we make ### Configuration an H2 sibling right above "Anatomy of a DRA resource driver"? Or maybe we can take it out entirely if we can port all of this content elsewhere like I suggested might be possible in my comment toward the top of this section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure @nojnhuh, we can add configuration docs in a followup PR if required. I will remove these changes from this PR.

README.md Outdated

### Configuration

The DRA example driver supports several configuration options that can be set via command-line flags, environment variables, or Helm values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this kind of documentation is best kept as close to where these are defined as possible. For command line flags, I'd prefer good descriptions/examples that show up in the generated --help (like you've already added in main.go) and for Helm values I'd like to see comments in values.yaml.

Those places are sufficient for basic usage docs like this, easier to maintain as we continue to add more config options. And those places drive where I tend to first look for usage docs (--help and helm show values).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, I will remove these changes.

// - bool: boolean values (e.g., "enabled=true", "disabled=false")
// - version: semantic version values (e.g., "driver_version=1.2.3")
// - string: any other value (e.g., "productName=NVIDIA GeForce RTX 5090", "architecture=Blackwell")
func parseDeviceAttributes(deviceAttributes string) (map[resourceapi.QualifiedName]resourceapi.DeviceAttribute, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems like a good candidate for some unit tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added units tests for this function.

return attributes, nil
}

pairs := strings.Split(deviceAttributes, ",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to specify a custom string attribute that contains a comma?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI: with string slice flag, each --device-attributes is treated as complete key=value pair and we do not split by commas now, so its okay to pass values with comma as below

--device-attributes 'productName=NVIDIA GeForce RTX 5090' --device-attributes 'notes=foo,bar'

Env Vars: is parsed as a comma-separated string (e.g., k=v,k=v), so commas inside values would be misinterpreted as separators

So If someone need commas, don’t use the env var. Use repeated CLI flags instead, or configure your deployment to pass args rather than env vars.

// Detect value type and create appropriate DeviceAttribute
attr, err := createDeviceAttribute(value)
if err != nil {
return nil, fmt.Errorf("invalid value for attribute %s: %v", key, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("invalid value for attribute %s: %v", key, err)
return nil, fmt.Errorf("invalid value for attribute %s: %w", key, err)

Destination: &flags.numDevices,
EnvVars: []string{"NUM_DEVICES"},
},
&cli.StringFlag{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would making this a StringSliceFlag let us offload splitting a set of multiple attributes to the arg parser so we don't need a hand-rolled CSV-like thing?

Comment on lines 176 to 181
// Check if first three parts are numeric
for i := 0; i < 3; i++ {
if _, err := strconv.Atoi(parts[i]); err != nil {
return false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least according to the regex from https://semver.org, a value like 1.2.3.4 is not valid but would be interpreted as a version here.

And does this handle values like 1.0.0-beta where the -beta shouldn't be expected to necessarily be a number?

Overall would it be easier to use a library like github.com/Masterminds/semver/v3 to see if a string is a valid semver?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to use a library. I will add this in the subsequent commit.

- Switch --device-attributes to StringSliceFlag
- Validate versions with Masterminds/semver
- Wrap errors with %w in parsing paths
- Add table-driven tests for parseDeviceAttributes and semver
- Trim README; point to --help and values.yaml
- Add comments/examples to values.yaml
@dharmjit
Copy link
Author

Thanks @nojnhuh for taking a look at the PR, I have resolved your review comments in the subsequent commit. Thanks!

Comment on lines +49 to +54
# Additional device attributes to be added to resource slices.
# When setting via env var (DEVICE_ATTRIBUTES), provide a comma-separated list of key=value entries:
# DEVICE_ATTRIBUTES: "productName=NVIDIA GeForce RTX 5090,architecture=Blackwell"
# Values containing commas are not supported via env var. Prefer repeated CLI flags, e.g.:
# --device-attributes productName=NVIDIA GeForce RTX 5090 \
# --device-attributes architecture=Blackwell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid mentioning exactly how this value is used within the chart since that's an implementation detail. The important details are:

  1. What the value does (it sets attributes on the devices)
  2. How it's formatted (comma-separated key=value pairs)

Leaving out the other details will make it easier to tweak this in the chart if we find a need to do that.

EnvVars: []string{"NUM_DEVICES"},
},
&cli.StringSliceFlag{
Name: "device-attributes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each instance of the flag defines one attribute, could we rename this flag?

Suggested change
Name: "device-attributes",
Name: "device-attribute",

Comment on lines +380 to +382
For usage and configuration options, prefer:
- CLI help: run `./dra-example-kubeletplugin --help` for flags and examples
- Helm values: consult `deployments/helm/dra-example-driver/values.yaml` for configurable settings and inline docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this section unchanged for now. There's a lot more we would want to put here and "TBD" is a good enough reminder of that.

"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/utils/ptr"

semver "github.com/Masterminds/semver/v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that the exact library used for CEL expressions in Kubernetes is https://pkg.go.dev/github.com/blang/semver/[email protected]. I don't imagine we'd see any practical differences with this one, but it'd be nice not to have to worry about that.

Comment on lines +63 to +65
for key, value := range additionalAttributes {
attributes[key] = value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My editor offers this simplification:

Suggested change
for key, value := range additionalAttributes {
attributes[key] = value
}
maps.Copy(attributes, additionalAttributes)

},
&cli.StringFlag{
Name: "device-attributes",
Usage: "Additional device attributes to be added to resource slices in key=value format, separated by commas. Example: productName=NVIDIA GeForce RTX 5090,architecture=Blackwell",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dharmjit I'm still curious if addressing this issue is worthwhile? I wouldn't block the PR on this, but I think we should at least note the caveats for anyone else who might be looking to implement this in their own drivers.

One other potential issue I could see is that if new types are implemented upstream (like lists), then implementing those here will be challenging without affecting existing users.

# Values containing commas are not supported via env var. Prefer repeated CLI flags, e.g.:
# --device-attributes productName=NVIDIA GeForce RTX 5090 \
# --device-attributes architecture=Blackwell
deviceAttributes: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I ran into trying this out is that I'm not sure it's possible to set multiple attributes with Helm's --set flag.

e.g. with --set kubeletPlugin.deviceAttributes="jon=was,here=9,jonagain=1.1,jon3=1.1.1,JON=true,jOn=false":

% helm get values -n dra-example-driver dra-example-driver
USER-SUPPLIED VALUES:
JON: "true"
here: "9"
jOn: "false"
jon3: 1.1.1
jonagain: "1.1"
kubeletPlugin:
  deviceAttributes: jon=was

Maybe we can make the Helm value an array and then build up the comma-separated string in the template where we define DEVICE_ATTRIBUTES? That would enable the following (ugly, but serviceable):

  --set "kubeletPlugin.deviceAttributes[0]=jon=was" \
  --set "kubeletPlugin.deviceAttributes[1]=here=9" \
  --set "kubeletPlugin.deviceAttributes[2]=jonagain=1.1" \
  --set "kubeletPlugin.deviceAttributes[3]=jon3=1.1.1" \
  --set "kubeletPlugin.deviceAttributes[4]=JON=true" \
  --set "kubeletPlugin.deviceAttributes[5]=jOn=false"

This template snippet seems to work:

        # Additional device attributes to be added to resource slices.
        - name: DEVICE_ATTRIBUTES
          value: {{ join "," .Values.kubeletPlugin.deviceAttributes | quote }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants