-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDI support #4056
base: master
Are you sure you want to change the base?
CDI support #4056
Conversation
4a22da3
to
a5c3952
Compare
I was also looking at moby/moby#45134 to register CDI device drivers. I think we need to add a new attribute in our TOML config to set the CDI specification directories otherwise CDI request will fail. |
@crazy-max I have not yet had a look at these changes in detail. From my understanding Note that from the NVIDIA side, this may have unintended consequences of producing container images that require a specific driver version to work correctly. The intent of the CDI device specifications that are used when running a container is to ensure that drivers that match the kernel mode driver on the host are injected dynamically, allowing applications that are built against the driver API to function as expected. |
You are correct that BuildKit is used to build containers, and with these changes, the goal is to make devices available at build time. This would allow users to mount specific devices when building container images. Regarding the unintended consequences of producing container images that require a specific driver version, it sound expected and we might put in place some kind of capabilities. Intention is to provide more flexibility and convenience to users during the container building process. However, I understand the importance of keeping portability across environments. If you have any further suggestions or concerns, please don't hesitate to let us know. We aim to create a seamless experience for users while ensuring compatibility and reliability across different setups. |
any update on this? @crazy-max @elezar so far any place I see I need to disable buildkit to make use of building NVIDIA related components. I would rather allow a very limited experience with tons of caveats than removing buildkit to build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments.
It might be good to call out in the PR description how this option is used and protected by a feature flag / capabilities. Documentation changes for the option that call out the caveats would also be appreciated.
executor/oci/spec_unix.go
Outdated
registry := cdi.GetRegistry(cdi.WithAutoRefresh(false)) | ||
if err := registry.Refresh(); err != nil { | ||
// We don't consider registry refresh failure a fatal error. | ||
// For instance, a dynamically generated invalid CDI Spec file for | ||
// any particular vendor shouldn't prevent injection of devices of | ||
// different vendors. CDI itself knows better, and it will fail the | ||
// injection if necessary. | ||
bklog.G(ctx).Warnf("CDI registry refresh failed: %v", err) | ||
} | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the v0.7.1 release you could use:
registry := cdi.GetRegistry(cdi.WithAutoRefresh(false)) | |
if err := registry.Refresh(); err != nil { | |
// We don't consider registry refresh failure a fatal error. | |
// For instance, a dynamically generated invalid CDI Spec file for | |
// any particular vendor shouldn't prevent injection of devices of | |
// different vendors. CDI itself knows better, and it will fail the | |
// injection if necessary. | |
bklog.G(ctx).Warnf("CDI registry refresh failed: %v", err) | |
} | |
return nil | |
} | |
} | |
_ := cdi.Configure(cdi.WithAutoRefresh(false)) | |
if err := cdi.Refresh(); err != nil { | |
// We don't consider registry refresh failure a fatal error. | |
// For instance, a dynamically generated invalid CDI Spec file for | |
// any particular vendor shouldn't prevent injection of devices of | |
// different vendors. CDI itself knows better, and it will fail the | |
// injection if necessary. | |
bklog.G(ctx).Warnf("CDI registry refresh failed: %v", err) | |
} | |
return nil | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/containerd/containerd/pull/10041/files
If this is require as external functionality, we could also consider exporting WithStaticCDIRegistry
from the oci
package like we do with WithCDIDevices
.
Hi @georgettica, thanks for your response. I'm a product manager at Docker, and this is something we are actively exploring at the moment, with the aim of getting a timeframe for the roadmap. Do you have details on what would be an acceptable limited experience for you? I would be interested to chat about your use case if you prefer? https://calendly.com/colin-hemmings-dock/buildkit-chat |
Signed-off-by: CrazyMax <[email protected]>
f7f20ed
to
a492e7f
Compare
disabled = true | ||
# List of directories to scan for CDI spec files. For more details about CDI | ||
# specification, please refer to https://github.com/cncf-tags/container-device-interface/blob/main/SPEC.md#cdi-json-specification | ||
specDirs = ["/etc/cdi", "/var/run/cdi", "/etc/buildkit/cdi"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think auto-allow can be in two places. We can have a list here for auto-allowed devices and additionally there can be annotation in the CDI yaml for specific device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there can be annotation in the CDI yaml for specific device.
I guess this would be under devices.annotations
like:
cdiVersion: 0.3.0
kind: nvidia.com/gpu
devices:
- name: gpu0
annotations:
org.mobyproject.buildkit.device.allow: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/allow/autoallow
as all devices are allowed.
client/llb/exec.go
Outdated
@@ -624,6 +640,12 @@ func AddUlimit(name UlimitName, soft int64, hard int64) RunOption { | |||
}) | |||
} | |||
|
|||
func AddCDIDevice(name string) RunOption { | |||
return runOptionFunc(func(ei *ExecInfo) { | |||
ei.State = ei.State.AddCDIDevice(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to take at least the required bool. The string value should be kind | kind=device | classifier
. I'm not sure if it makes sense to break these apart in the LLB level or keep as a single string variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string value should be
kind | kind=device | classifier
. I'm not sure if it makes sense to break these apart in the LLB level or keep as a single string variable.
I think better to keep name as a string as we are already validating it when generating the spec: https://github.com/moby/buildkit/pull/4056/files#diff-478a239cda41c7aff441eef90d0eae4fc531f3fcc30db204e9dfbe64b9d863aaR182-R184
b183106
to
98ae798
Compare
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: Tonis Tiigi <[email protected]>
Signed-off-by: Tonis Tiigi <[email protected]>
Access should be managed by entitlements checks Signed-off-by: Tonis Tiigi <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
Signed-off-by: CrazyMax <[email protected]>
var out []llb.RunOption | ||
devices := instructions.GetDevices(c) | ||
for _, device := range devices { | ||
out = append(out, llb.AddCDIDevice(llb.CDIDeviceName(device), llb.CDIDeviceOptional)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a CDI device set in a dockerfile always marked as optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think instead we should have a required
field like we do for secrets and is optional by default:
RUN --device=name=vendor1.com/device=foo,required
Signed-off-by: CrazyMax <[email protected]>
FrontendAttrs: map[string]string{ | ||
"device:vendor1.com/device=foo": "required", | ||
"device:vendor2.com/device=bar": "required=false", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there is an issue with frontend attributes: #4715 (comment)
sandbox.go:205: panic: failed to parse devices: invalid CDI device name "vendor2.com/device": unqualified device "vendor2.com/device", missing vendor
I changed the device attribute to a map with device:
prefix so we can pass multiple of them but seems the key with device name is already parsed somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the format to use an index instead:
FrontendAttrs: map[string]string{
"device:0": "vendor1.com/device=foo,required",
"device:1": "vendor2.com/device=bar",
},
Don't really like it but not sure what's best 🙈
Signed-off-by: CrazyMax <[email protected]>
related to #1436
Adds initial support for specifying fully-qualified CDI device names.