Skip to content

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Oct 3, 2025

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        providerSpec:
          value:
            ami:
              id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
  logLevel: Normal
  operatorLogLevel: Normal
  managedBootImages:
    machineManagers:
      - resource: controlplanemachinesets
        apiGroup: machine.openshift.io
        selection:
          mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully. This process might take a while(took about 10-15 minutes on GCP for me) to complete as the CPMS controller will first scale up the replacement and then drain and delete the older control plane machine. I think this is to maintain etcd quorum at all points of the process.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
  logLevel: Normal
  operatorLogLevel: Normal
  managedBootImages:
    machineManagers:
      - resource: controlplanemachinesets
        apiGroup: machine.openshift.io
        selection:
          mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 3, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2025
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 3, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2025
@djoshy
Copy link
Contributor Author

djoshy commented Oct 3, 2025

/test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 6, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would be no-op. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 6, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Oct 6, 2025

/test verify

This captures updates for the ManagedBootImages API
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 9, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy marked this pull request as ready for review October 9, 2025 13:48
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2025
@djoshy
Copy link
Contributor Author

djoshy commented Oct 9, 2025

Opening this up for initial review; I've integrated the API from openshift/api#2396.

go func() { ctrl.syncMAPIMachineSets("MAPIMachinesetDeleted") }()
}

func (ctrl *Controller) addControlPlaneMachineSet(obj interface{}) {
Copy link
Member

@isabella-janssen isabella-janssen Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit: comments overviewing the functions throughout this file might be useful (though the function names are pretty self-explanatory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update on my next pass 😄

// Update/Check all ControlPlaneMachineSets instead of just this one. This prevents needing to maintain a local
// store of machineset conditions. As this is using a lister, it is relatively inexpensive to do
// this.
go func() { ctrl.syncControlPlaneMachineSets("ControlPlaneMachineSetAdded") }()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprissed about how the handling of events has been done, not in a bad way, but I do think it's unconventional and it may have some caveats. Do we care about event arrival order? If so (and I think we do), the ideal approach would be the usual pattern of a channel consumed in a single go-routine that pulls events from the channel as soon as they arrive. That would preserve the order.

Copy link
Contributor Author

@djoshy djoshy Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I went a bit unconventional here because I wanted all the resource reconciliations to happen in a single thread, one resource after the other; mainly to preserve the condition updates since each update writes to the status of the MachineConfiguration resource. And any event results in the same action i.e. loop through all the machine resources. I added mutexes in the actual sync functions to help with the ordering, so follow-up syncs won't step on each other.

Even so, the ordering itself isn't all that important, since the controller is listening on all the machine resources for any deviations. We do this because we want to alert the admin if we're hot looping when there is another actor on the boot image field. This does cause a quirk though: right after the MCO does perform an update, it immediately does a no-op loop through the resources.

I'm happy to rework this into a channel based system as a follow-up - this was a bit of an experiment I did at the time because I was curious about using mutexes instead of the channels 😅

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 10, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully. This process might take a while(took about 10-15 minutes on GCP for me) to complete as the CPMS controller will first scale up the replacement and then drain and delete the older control plane machine. I think this is to maintain etcd quorum at all points of the process.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl 502f475 link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 502f475 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-aws-mco-disruptive 502f475 link false /test e2e-aws-mco-disruptive
ci/prow/e2e-gcp-op-2of2 0caa774 link true /test e2e-gcp-op-2of2
ci/prow/bootstrap-unit 0caa774 link false /test bootstrap-unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants