-
Notifications
You must be signed in to change notification settings - Fork 75
Add mechanism to pass device attributes #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add mechanism to pass device attributes #125
Conversation
- Add --device-attributes CLI flag and DEVICE_ATTRIBUTES env var - Implement automatic type detection for int, bool, version, string values - Add Helm template support for device attributes configuration
|
Welcome @dharmjit! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dharmjit The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| }, | ||
| &cli.StringFlag{ | ||
| Name: "device-attributes", | ||
| Usage: "Additional device attributes to be added to resource slices in key=value format, separated by commas. Example: productName=NVIDIA GeForce RTX 5090,architecture=Blackwell", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we could encode the type in here too to eliminate the need to guess? Maybe like key=type=value? Or encoding these as JSON or some other format that already knows how to distinguish types? Or maybe take inspiration from Helm's --set and --set-string? Telling the difference between a string and a version with the latter two options might still be tricky though unless we make the user specify that explicitly like in the first option.
Something like that would also allow setting true and false as strings instead of always being interpreted as booleans. And likewise allowing strings that look like numbers or versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dharmjit I'm still curious if addressing this issue is worthwhile? I wouldn't block the PR on this, but I think we should at least note the caveats for anyone else who might be looking to implement this in their own drivers.
One other potential issue I could see is that if new types are implemented upstream (like lists), then implementing those here will be challenging without affecting existing users.
|
|
||
| kubeletPlugin: | ||
| numDevices: 8 | ||
| deviceAttributes: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't obviously tied to the command line flag only from looking at this file, could we include the format and one example here? That might save others from digging around too much to see how to use this.
README.md
Outdated
| name: gpu-0 | ||
| ``` | ||
| TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @klueska can clarify the original intent, but I envisioned this "Anatomy of a DRA resource driver" section to describe how drivers work in general, not the specific settings of this particular driver that might not apply to at least most other drivers.
If I'm interpreting that right, could we make ### Configuration an H2 sibling right above "Anatomy of a DRA resource driver"? Or maybe we can take it out entirely if we can port all of this content elsewhere like I suggested might be possible in my comment toward the top of this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure @nojnhuh, we can add configuration docs in a followup PR if required. I will remove these changes from this PR.
README.md
Outdated
|
|
||
| ### Configuration | ||
|
|
||
| The DRA example driver supports several configuration options that can be set via command-line flags, environment variables, or Helm values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this kind of documentation is best kept as close to where these are defined as possible. For command line flags, I'd prefer good descriptions/examples that show up in the generated --help (like you've already added in main.go) and for Helm values I'd like to see comments in values.yaml.
Those places are sufficient for basic usage docs like this, easier to maintain as we continue to add more config options. And those places drive where I tend to first look for usage docs (--help and helm show values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, I will remove these changes.
| // - bool: boolean values (e.g., "enabled=true", "disabled=false") | ||
| // - version: semantic version values (e.g., "driver_version=1.2.3") | ||
| // - string: any other value (e.g., "productName=NVIDIA GeForce RTX 5090", "architecture=Blackwell") | ||
| func parseDeviceAttributes(deviceAttributes string) (map[resourceapi.QualifiedName]resourceapi.DeviceAttribute, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems like a good candidate for some unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added units tests for this function.
| return attributes, nil | ||
| } | ||
|
|
||
| pairs := strings.Split(deviceAttributes, ",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to specify a custom string attribute that contains a comma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI: with string slice flag, each --device-attributes is treated as complete key=value pair and we do not split by commas now, so its okay to pass values with comma as below
--device-attributes 'productName=NVIDIA GeForce RTX 5090' --device-attributes 'notes=foo,bar'
Env Vars: is parsed as a comma-separated string (e.g., k=v,k=v), so commas inside values would be misinterpreted as separators
So If someone need commas, don’t use the env var. Use repeated CLI flags instead, or configure your deployment to pass args rather than env vars.
| // Detect value type and create appropriate DeviceAttribute | ||
| attr, err := createDeviceAttribute(value) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("invalid value for attribute %s: %v", key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return nil, fmt.Errorf("invalid value for attribute %s: %v", key, err) | |
| return nil, fmt.Errorf("invalid value for attribute %s: %w", key, err) |
| Destination: &flags.numDevices, | ||
| EnvVars: []string{"NUM_DEVICES"}, | ||
| }, | ||
| &cli.StringFlag{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would making this a StringSliceFlag let us offload splitting a set of multiple attributes to the arg parser so we don't need a hand-rolled CSV-like thing?
| // Check if first three parts are numeric | ||
| for i := 0; i < 3; i++ { | ||
| if _, err := strconv.Atoi(parts[i]); err != nil { | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least according to the regex from https://semver.org, a value like 1.2.3.4 is not valid but would be interpreted as a version here.
And does this handle values like 1.0.0-beta where the -beta shouldn't be expected to necessarily be a number?
Overall would it be easier to use a library like github.com/Masterminds/semver/v3 to see if a string is a valid semver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to use a library. I will add this in the subsequent commit.
- Switch --device-attributes to StringSliceFlag - Validate versions with Masterminds/semver - Wrap errors with %w in parsing paths - Add table-driven tests for parseDeviceAttributes and semver - Trim README; point to --help and values.yaml - Add comments/examples to values.yaml
|
Thanks @nojnhuh for taking a look at the PR, I have resolved your review comments in the subsequent commit. Thanks! |
| # Additional device attributes to be added to resource slices. | ||
| # When setting via env var (DEVICE_ATTRIBUTES), provide a comma-separated list of key=value entries: | ||
| # DEVICE_ATTRIBUTES: "productName=NVIDIA GeForce RTX 5090,architecture=Blackwell" | ||
| # Values containing commas are not supported via env var. Prefer repeated CLI flags, e.g.: | ||
| # --device-attributes productName=NVIDIA GeForce RTX 5090 \ | ||
| # --device-attributes architecture=Blackwell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid mentioning exactly how this value is used within the chart since that's an implementation detail. The important details are:
- What the value does (it sets attributes on the devices)
- How it's formatted (comma-separated key=value pairs)
Leaving out the other details will make it easier to tweak this in the chart if we find a need to do that.
| EnvVars: []string{"NUM_DEVICES"}, | ||
| }, | ||
| &cli.StringSliceFlag{ | ||
| Name: "device-attributes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each instance of the flag defines one attribute, could we rename this flag?
| Name: "device-attributes", | |
| Name: "device-attribute", |
| For usage and configuration options, prefer: | ||
| - CLI help: run `./dra-example-kubeletplugin --help` for flags and examples | ||
| - Helm values: consult `deployments/helm/dra-example-driver/values.yaml` for configurable settings and inline docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this section unchanged for now. There's a lot more we would want to put here and "TBD" is a good enough reminder of that.
| "k8s.io/apimachinery/pkg/api/resource" | ||
| "k8s.io/utils/ptr" | ||
|
|
||
| semver "github.com/Masterminds/semver/v3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that the exact library used for CEL expressions in Kubernetes is https://pkg.go.dev/github.com/blang/semver/[email protected]. I don't imagine we'd see any practical differences with this one, but it'd be nice not to have to worry about that.
| for key, value := range additionalAttributes { | ||
| attributes[key] = value | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My editor offers this simplification:
| for key, value := range additionalAttributes { | |
| attributes[key] = value | |
| } | |
| maps.Copy(attributes, additionalAttributes) |
| }, | ||
| &cli.StringFlag{ | ||
| Name: "device-attributes", | ||
| Usage: "Additional device attributes to be added to resource slices in key=value format, separated by commas. Example: productName=NVIDIA GeForce RTX 5090,architecture=Blackwell", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dharmjit I'm still curious if addressing this issue is worthwhile? I wouldn't block the PR on this, but I think we should at least note the caveats for anyone else who might be looking to implement this in their own drivers.
One other potential issue I could see is that if new types are implemented upstream (like lists), then implementing those here will be challenging without affecting existing users.
| # Values containing commas are not supported via env var. Prefer repeated CLI flags, e.g.: | ||
| # --device-attributes productName=NVIDIA GeForce RTX 5090 \ | ||
| # --device-attributes architecture=Blackwell | ||
| deviceAttributes: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue I ran into trying this out is that I'm not sure it's possible to set multiple attributes with Helm's --set flag.
e.g. with --set kubeletPlugin.deviceAttributes="jon=was,here=9,jonagain=1.1,jon3=1.1.1,JON=true,jOn=false":
% helm get values -n dra-example-driver dra-example-driver
USER-SUPPLIED VALUES:
JON: "true"
here: "9"
jOn: "false"
jon3: 1.1.1
jonagain: "1.1"
kubeletPlugin:
deviceAttributes: jon=was
Maybe we can make the Helm value an array and then build up the comma-separated string in the template where we define DEVICE_ATTRIBUTES? That would enable the following (ugly, but serviceable):
--set "kubeletPlugin.deviceAttributes[0]=jon=was" \
--set "kubeletPlugin.deviceAttributes[1]=here=9" \
--set "kubeletPlugin.deviceAttributes[2]=jonagain=1.1" \
--set "kubeletPlugin.deviceAttributes[3]=jon3=1.1.1" \
--set "kubeletPlugin.deviceAttributes[4]=JON=true" \
--set "kubeletPlugin.deviceAttributes[5]=jOn=false"
This template snippet seems to work:
# Additional device attributes to be added to resource slices.
- name: DEVICE_ATTRIBUTES
value: {{ join "," .Values.kubeletPlugin.deviceAttributes | quote }}
This PR adds a mechanism to pass device attributes to emulate the real devices in local test setups.