-
Notifications
You must be signed in to change notification settings - Fork 417
Add CRI plugin config from source containerd config to drop-in file #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CRI plugin config from source containerd config to drop-in file #1311
Conversation
53fd2be
to
af457bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the nvidia-ctk-installer to include existing runtime configurations from the source containerd config when creating drop-in files. Instead of starting with an empty config, the drop-in file now preserves all runtime configurations already present in the source's 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes' section.
- Modified the destination config to start with existing runtime configurations from source
- Added a new function
getBaseDropInConfigTree
to extract and copy runtime configurations - Updated all test expectations to include existing runtime configurations in drop-in files
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
pkg/config/engine/containerd/containerd.go | Implements the core logic to copy existing runtime configs to drop-in files |
pkg/config/engine/containerd/config_test.go | Updates test expectations to include existing runtime configurations |
cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go | Updates integration test expectations to verify runtime config preservation |
// The destinationConfig is a minimal config with the same options | ||
// as the source config. The starting content of the destinationConfig | ||
// is the entire plugins."io.containerd.grpc.v1.cri".containerd.runtimes | ||
// section of the source config. This is needed due to how containerd | ||
// merges configuration from multiple files. | ||
// Reference: https://github.com/containerd/containerd/issues/5837 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that newer containerd versions (v2.1) do seem to do what we want, did we want to make this optional somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a local change that I can create a PR from as a follow-up. Let's not overcomplicate this now though.
af457bc
to
2c06db4
Compare
2c06db4
to
e1c213e
Compare
Pull Request Test Coverage Report for Build 17971376989Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
Initial testing of these changes looks good. Using the GPU Operator (built from NVIDIA/gpu-operator#1710) I have verified several of the tests captured in https://github.com/NVIDIA/nvidia-container-toolkit/blob/b3ce3a4f3855764208b4b14004620d27a947110a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go. |
if cfg.UseDropInConfig() { | ||
return cfg.CleanupDropInConfig() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar note, I remove the nvidia runtimes first before a call to CleanupDropInConfig()
since the ConfigWithDropIn
implementation of RemoveRuntime
removes runtimes from both the top-level config and the drop-in file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The other option would be to remove the runtimes in the top-level config when ADDING them to the drop in, but that may be more complex than it needs to be.
e1c213e
to
dafe333
Compare
} | ||
|
||
func (c *Config) UseDropInConfig() bool { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why is this always false
? for containerd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is false for the Config
implementation, and true for the ConfigWithDropIn
implemenation.
Thinking about it now, this method is not very useful. I believe we can just call CleanupDropInConfig
directly and this can be a no-op for all cases except for ConfigWithDropIn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think the check was only needed if we called this BEFORE we removed the runtimes. Since we're doing that first now, we can just call it unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and yes. I'm sorry, it's late).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left this as-is for now. Feel free to clean this up further as you see fit.
I spent some time trying to remove the UseDropInConfig
method entirely (and unconditionally calling CleanupDropInConfig
), but this did not end up being as straightforward as I would have liked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could something like #1311 (comment) work? If I recall correctly this is what you proposed initially. This is definitely simpler than trying to "clear" the config after the fact.
} | ||
|
||
func (c *ConfigWithDropIn) CleanupDropInConfig() { | ||
c.Interface = &Config{Tree: toml.NewEmpty()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not just be:
c.Interface = &Config{Tree: toml.NewEmpty()} | |
c.Tree = toml.NewEmpty() |
What about the configOptions
which are used to update the config later? (I understand that we only ever call this AFTER we're done updating the config, but this may not always be the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't able to do this without some refactoring. Currently, an engine.Interface
is a member of ConfigWithDropIn
and not a Config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I think I'm missing some important details here after a very long day. I will have a look again after some rest.
pkg/config/engine/crio/crio.go
Outdated
} | ||
|
||
func (c *Config) CleanupDropInConfig() { | ||
c.Tree = toml.NewEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works for crio
then i think we should be able to use it for containerd
too.
pkg/config/engine/api.go
Outdated
EnableCDI() | ||
GetRuntimeConfig(string) (RuntimeConfig, error) | ||
RemoveRuntime(string) error | ||
UseDropInConfig() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note to self. We should probably clean up this interface / compose the other two interfaces that we have introduced.
dafe333
to
6e7220f
Compare
if o.DropInConfig == "" { | ||
return nil | ||
} | ||
// When a drop-in config is used, we remove the drop-in file explicitly. | ||
// This is require for cases where we may have to include other contents | ||
// in the drop-in file and as such it may not be empty when we flush it. | ||
err = os.Remove(o.DropInConfig) | ||
if err != nil && !errors.Is(err, os.ErrNotExist) { | ||
return fmt.Errorf("failed to remove drop-in config file: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdesiniotis given the discussions around the additional functions to the interface, I thought it would actually be cleaner to explicitly remove the file here instead of a more complex implementation at a lower level.
I'm going to push out an rc including these changes be we can always revisit if there's something that I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the file is definitely cleaner and simplifies implementation. I am onboard as long as we are sure the conditional immediately preceding this is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For use cases where we are modifying the config directly -- or drop ins are not supported -- a user SHOULD be setting DropInConfig
to ""
explicitly. This would be the case for docker
, or for v1 containerd configs.
The check for DropInConfig == ""
also mirrors what we do when deciding whether to flush to the drop-in config or the "regular" config.
Are the situations that you're concerned about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see that this is aligned with the code that flushes the config, so I believe this addresses my question.
For use cases where we are modifying the config directly -- or drop ins are not supported -- a user SHOULD be setting DropInConfig to "" explicitly. This would be the case for docker, or for v1 containerd configs.
Ah, I wasn't aware that the intention was for users to set --drop-in-config=""
explicitly when using v1 containerd configs... As long as we cover this in our documentation, this should be fine. Most should be using v2 or v3 configs at this point anyways.
6e7220f
to
887b7f6
Compare
This change updates the drop-in file contents that the nvidia-ctk-installer creates for containerd. In addition to adding nvidia runtimes, the nvidia-ctk-installer also includes all existing configuration present in the source containerd configuration for the CRI plugin. That is, all configuration already present in the 'plugins."io.containerd.grpc.v1.cri"' section will be added to our drop-in file. This is needed due to how containerd merges configuration from multiple files. See containerd/containerd#5837 Signed-off-by: Christopher Desiniotis <[email protected]>
887b7f6
to
598c632
Compare
if o.DropInConfig == "" { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question -- if o.DropInConfig != ""
is it always the case that we are using a drop-in file? Even if this option is specified, my understanding is that a drop-in file would not be used for 1) v1 containerd configs, or 2) cri-o when the hook
config mode is configured.
This change updates the drop-in file contents that the nvidia-ctk-installer creates for containerd. In addition to adding nvidia runtimes, the nvidia-ctk-installer also includes all existing configuration present in the source containerd configuration for the CRI plugin. That is, all configuration already present in the 'plugins."io.containerd.grpc.v1.cri"' section will be added to our drop-in file.
This is needed due to how containerd merges configuration from multiple files. See containerd/containerd#5837