OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21 #5408

ngopalak-redhat · 2025-11-12T04:45:52Z

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)
Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set
Automatically clear systemReservedCgroup when reservedSystemCPUs is detected
Set enforceNodeAllocatable to ["pods"] only in this scenario
Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster
SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable
Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods
system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)
Wait for the MachineConfig to be applied and nodes to reboot
SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable
Verify that:
- systemReservedCgroup is NOT present (empty/cleared)
- enforceNodeAllocatable only contains ["pods"]
- Kubelet starts successfully without errors
Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior
Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

JIRA: https://issues.redhat.com/browse/OCPNODE-3201
Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.21. The clusters upgraded from 4.20 also will have this enabled.

openshift-ci · 2025-11-12T04:46:13Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-11-12T04:49:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ngopalak-redhat
Once this PR has been reviewed and has the lgtm label, please assign yuqi-zhang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-11-19T12:52:40Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

TODO: Before Review

Complete upgrade testing

What I did

This PR enables system-reserved-compressible enforcement by default for all new OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in new OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Migration PR: WIP : [release-4.20] kubelet-config compressible patch #5412

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T00:44:20Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Migration PR: WIP : [release-4.20] kubelet-config compressible patch #5412

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T00:48:31Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.21. The clusters upgraded from 4.20 also will have this enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ngopalak-redhat · 2025-11-20T00:57:46Z

cc: @MarSik @ffromani

openshift-ci · 2025-11-20T04:12:29Z

@ngopalak-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-op-single-node	`00bb8e1`	link	true	`/test e2e-gcp-op-single-node`
ci/prow/e2e-hypershift	`00bb8e1`	link	true	`/test e2e-hypershift`
ci/prow/bootstrap-unit	`00bb8e1`	link	false	`/test bootstrap-unit`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

sairameshv · 2025-11-20T08:19:10Z

pkg/controller/kubelet-config/helpers.go

 	}
+	// Validate that systemReservedCgroup matches systemCgroups if both are set
+	if kcDecoded.SystemReservedCgroup != "" && kcDecoded.SystemCgroups != "" {
+		if kcDecoded.SystemReservedCgroup != kcDecoded.SystemCgroups {


Why should both the values of SystemReservedCgroup and SystemCgroups match?
From the kubelet configuration doc I don't find such a condition.

As per https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

It is recommended that the OS system daemons are placed under a top level control group (system.slice on systemd machines for example).

If its not the same, the enforcement would happen on different cgroup while the calculation of the values would happen using SystemCgroups

Apologies, I'm still unclear on this.

sairameshv · 2025-11-20T08:26:19Z

pkg/controller/kubelet-config/helpers.go

+		klog.Infof("reservedSystemCPUs is set to %s, disabling systemReservedCgroup enforcement", originalKubeConfig.ReservedSystemCPUs)
+	}
+
+	if shouldDisableSystemReservedCgroup {


You can directly use the above condition without a need for a new variable as it won't be used anywhere else again.
We know that the --reserved-cpus flag supersedes the other flags, why should we clear the following settings explicitly? Is it because the kubelet complains to start? If so, could you add a relevant comment?

Suggested change

if shouldDisableSystemReservedCgroup {

if originalKubeConfig.ReservedSystemCPUs != "" {

Agreed, I had another condition before this to handle upgrade. Hence the variable got added. I'll change it.

ngopalak-redhat · 2025-11-20T09:10:26Z

@haircommander Please review

ngopalak-redhat changed the title ~~Implement system-reserved-compressible~~ WIP: Implement system-reserved-compressible Nov 12, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2025

OCPNODE-3201: Make system-reserved-compressible default

00bb8e1

ngopalak-redhat force-pushed the ngopalak/system-reserved-compressible-1 branch from ca28d80 to 00bb8e1 Compare November 17, 2025 03:53

ngopalak-redhat changed the title ~~WIP: Implement system-reserved-compressible~~ OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21 Nov 19, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 19, 2025

ngopalak-redhat marked this pull request as ready for review November 20, 2025 00:48

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

openshift-ci bot requested review from RishabhSaini and djoshy November 20, 2025 00:49

sairameshv reviewed Nov 20, 2025

View reviewed changes

ngopalak-redhat marked this pull request as draft November 20, 2025 15:11

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

	if shouldDisableSystemReservedCgroup {
	if originalKubeConfig.ReservedSystemCPUs != "" {

OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21 #5408

Are you sure you want to change the base?

OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21 #5408

Uh oh!

Conversation

ngopalak-redhat commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 12, 2025

Uh oh!

openshift-ci bot commented Nov 12, 2025

Uh oh!

openshift-ci-robot commented Nov 19, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO: Before Review

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngopalak-redhat commented Nov 20, 2025

Uh oh!

openshift-ci bot commented Nov 20, 2025

Uh oh!

sairameshv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngopalak-redhat Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

sairameshv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

sairameshv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngopalak-redhat Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngopalak-redhat commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngopalak-redhat commented Nov 12, 2025 •

edited

Loading

openshift-ci-robot commented Nov 19, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading