Some images require the building and installation of kernel drivers using the Dynamic Kernel Module Support (DKMS) tool. This allows kernel modules to be built for the specific kernel used in the image. The DKMS tool requires access to the running kernel that is not usually allowed by the Image Management Service (IMS). In order to safely allow the expanded access, the IMS configuration must be modified to enable the feature.
Many DKMS build and install scripts require access to the system /proc
, /dev
, and /sys
directories which
allows access to running processes and system services. The IMS jobs run as an administrator user since preparing
images requires root access to work properly. Allowing root access to the running system would allow an
unacceptable security vulnerability to the Kubernetes worker node the job is running on.
To address the security concerns, but also allow the DKMS tool to install kernel modules in image customization, a Kata Virtual Machine (VM) is used. When DKMS is enabled in IMS, the jobs are modified to run inside a Kata VM. The DKMS tool then has enhanced access to the running Kata VM kernel, but is unable to interact directly with the Kubernetes worker the job is running on.
It is required that Kubernetes be configured with Kata. That should be part of the standard NCN worker configuration, so documentation on how to do that is outside the scope of the IMS documentation.
**NOTE: Since the IMS job is running inside a VM, there will be a performance impact on the runtime of the job but this is required to provide a secure environment.
The following steps will enable DKMS operation for all IMS jobs including those controlled by the Configuration Management Service (CFS). It will remain in this configuration until manually reverted back to disabling the DKMS operation.
-
(
ncn-mw#
) Check which Kata runtime class is installed.kubectl get runtimeclass
Expected output is something like:
NAME HANDLER AGE kata-qemu kata-qemu 64d
Make note of the kata configuration to use for the IMS jobs.
**NOTE: if there are no kata runtime classes returned by the above step, then Kata must be configured on the system. Instructions for that are beyond the scope of the IMS documentation.
-
(
ncn-mw#
) Edit theims-config
Kubernetes configuration map to enable DKMS.kubectl -n services edit cm ims-config
Look for the lines:
JOB_ENABLE_DKMS: "False" JOB_KATA_RUNTIME: kata-qemu
Change the value for
JOB_ENABLE_DKMS
toTrue
. If the Kata runtime class on the system is notkata-qemu
then change theJOB_KATA_RUNTIME
to the desired configuration:JOB_ENABLE_DKMS: "True" JOB_KATA_RUNTIME: kata-qemu
Exit editing the configmap, saving the new values.
-
(
ncn-mw#
) Restart the IMS pod to pick up the new ConfigMap values.Find the current
cray-ims
pod:kubectl -n services get pods | grep ims
Expected output will look something like:
cray-ims-bc875d949-fffk6 2/2 Running 0 4h29m ims-post-upgrade-gkf4t 0/2 Completed 0 2d3h
Delete the running pod:
kubectl -n services delete pod cray-ims-bc875d949-fffk6
Then wait until the new pod is in the
2/2 Running
status. New IMS jobs will be created in Kata VMs with enhanced kernel access.
To revert the settings so the IMS jobs no longer run inside a Kata VM with the enhanced kernel
access change the ims-config
setting back to False
and restart the cray-ims
pod again.
-
(
ncn-mw#
) Edit theims-config
Kubernetes configuration map to disable DKMS.kubectl -n services edit cm ims-config
Look for the lines:
JOB_ENABLE_DKMS: "True" JOB_KATA_RUNTIME: kata-qemu
Change the value for
JOB_ENABLE_DKMS
toFalse
. The variableJOB_KATA_RUNTIME
is not used when under this scenario so its value does not matter.JOB_ENABLE_DKMS: "False" JOB_KATA_RUNTIME: kata-qemu
Exit editing the configmap, saving the new values.
-
(
ncn-mw#
) Restart the IMS pod to pick up the new ConfigMap values.Find the current
cray-ims
pod:kubectl -n services get pods | grep ims
Expected output will look something like:
cray-ims-bc875d949-64fc1 2/2 Running 0 4h29m ims-post-upgrade-gkf4t 0/2 Completed 0 2d3h
Delete the running pod:
kubectl -n services delete pod cray-ims-bc875d949-64fc1
Then wait until the new pod is in the
2/2 Running
status. Now new IMS jobs will be started running directly on the Kubernetes node and without the enhanced kernel access.
There is a data field for each recipe stored with IMS that can set if that particular recipe requires DKMS to be enabled to built successfully. If this is set to 'True' it will override the global DKMS setting described above.
To set the dkms_required
field for a particular recipe:
-
(
ncn-mw#
) Set a variable with the IMS Recipe ID in the environment:IMS_RECIPE_ID=2233c82a-5081-4f67-bec4-4b59a60017a6
-
(
ncn-mw#
) Look at the current recipe record:cray ims recipes describe $IMS_RECIPE_ID
Expected output:
{ "arch": "x86_64", "created": "2023-06-20T08:01:22.819146+00:00", "id": "c66f130c-c7c6-46b4-bb58-3fc17f08929f", "link": { "etag": "", "path": "s3://ims/recipes/c66f130c-c7c6-46b4-bb58-3fc17f08929f/myrecipe20June2023.tgz", "type": "s3" }, "linux_distribution": "sles15", "name": "myrecipe20June2023", "recipe_type": "kiwi-ng", "require_dkms": false, "template_dictionary": [] }
-
(
ncn-mw#
) Change the value ofrequire_dkms
for the recipe:cray ims recipes update --require-dkms true $IMS_RECIPE_ID
Expected output:
{ "arch": "x86_64", "created": "2023-06-20T08:01:22.819146+00:00", "id": "c66f130c-c7c6-46b4-bb58-3fc17f08929f", "link": { "etag": "", "path": "s3://ims/recipes/c66f130c-c7c6-46b4-bb58-3fc17f08929f/myrecipe20June2023.tgz", "type": "s3" }, "linux_distribution": "sles15", "name": "myrecipe20June2023", "recipe_type": "kiwi-ng", "require_dkms": true, "template_dictionary": [] }
The call to create a new job in IMS has a require-dkms
field that will override the global and
recipe setting. If a value is passed in directly it will always take precedence when the job is
created.
-
(
ncn-mw#
) Use therequire-dkms
option when creating a recipe build job:cray ims jobs create \ --job-type create \ --image-root-archive-name cray-sles15-barebones \ --artifact-id $IMS_RECIPE_ID \ --public-key-id $IMS_PUBLIC_KEY_ID \ --enable-debug False \ --require-dkms True
Example output:
status = "creating" enable_debug = false kernel_file_name = "vmlinuz" artifact_id = "2233c82a-5081-4f67-bec4-4b59a60017a6" build_env_size = 10 job_type = "create" kubernetes_service = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-service" kubernetes_job = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-create" id = "ad5163d2-398d-4e93-94f0-2f439f114fe7" image_root_archive_name = "cray-sles15-barebones" initrd_file_name = "initrd" arch = "x86_64" require_dkms = true created = "2018-11-21T18:22:53.409405+00:00" public_key_id = "a252ff6f-c087-4093-a305-122b41824a3e" kubernetes_configmap = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-configmap"
-
(
ncn-mw#
) Use therequire-dkms
option when creating an image customization job:cray ims jobs create \ --job-type customize \ --image-root-archive-name cray-sles15-barebones \ --artifact-id $IMS_IMAGE_ID \ --public-key-id $IMS_PUBLIC_KEY_ID \ --enable-debug False \ --require-dkms True
Example output:
status = "creating" enable_debug = false kernel_file_name = "vmlinuz" artifact_id = "2233c82a-5081-4f67-bec4-4b59a60017a6" build_env_size = 10 job_type = "customize" kubernetes_service = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-service" kubernetes_job = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-create" id = "ad5163d2-398d-4e93-94f0-2f439f114fe7" image_root_archive_name = "cray-sles15-barebones" initrd_file_name = "initrd" arch = "x86_64" require_dkms = true created = "2018-11-21T18:22:53.409405+00:00" public_key_id = "a252ff6f-c087-4093-a305-122b41824a3e" kubernetes_configmap = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-configmap"