Skip to content

Conversation

cheesesashimi
Copy link
Member

@cheesesashimi cheesesashimi commented Sep 26, 2025

- What I did

The first-pass of OCL made use of the fact that the OS image has the oc binary so that we would always have the most appropriate version of the binary for a given OpenShift release. Unfortunately, this has the following side effects:

  • Builds take much longer since the base OS image must be pulled twice; and it is a large image.
  • It causes issues with garbage collection on the node after the build pod has been run there.

There are two paths forward to mitigate this:

  1. Install the openshift-clients package into the MCO image.
  2. Extract the oc binary from the base OS image after we've used it.

For 4.19 and 4.20, we can use the first approach. For 4.18, we must use the latter approach. Here's why:

  • There were SSL verification failures whenever the DNF wrapper script tries to run dnf install openshift-clients. It does not work correctly in the base image we use in 4.18 (registry.ci.openshift.org/ocp/builder:rhel-9-enterprise-base-multi-openshift-4.18) or 4.19 (registry.ci.openshift.org/ocp/builder:rhel-9-enterprise-base-multi-openshift-4.18).
  • The reason we chose this as the base image for development is because they are manifestlisted and have multi-arch support. Since our team has a heterogenous mix of workstations, this is a requirement for us. Luckily, the newer base images registry.ci.openshift.org/ocp/4.19:base-rhel9 and registry.ci.openshift.org/ocp/4.20:base-rhel9 are manifestlisted and have multi-arch support. Unfortunately, the 4.18 version of this image does not. That said, I have added some conditional logic to handle this case better.

So what we do now is install the oc binary in the container image where / when possible. We check if the binary is present and if not, we extract it from the base OS image after the build is complete. Otherwise, we use the oc binary as-is if it is indeed present.

- How to verify it

  1. Opt into OCL and perform a build.
  2. Note which node the build is running on.
  3. Retrieve the base OS image pullspec: $ oc get configmap/machine-config-osimageurl -n openshift-machine-config-operator -o json | jq -r '.data.baseOSContainerImage'
  4. oc debug into the node and run crictl images -o json | grep "<base os image pullspec>". The base OS image should not be found in CRI-O's container image storage.

It is worth noting that the reason why the image should not be found there is because the Buildah pod cannot touch the host's container image storage.

- Description for the changelog
Extract oc binary after OCL build

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2025
@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Sep 26, 2025
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 26, 2025
@openshift-ci-robot
Copy link
Contributor

@cheesesashimi: This pull request references Jira Issue OCPBUGS-57473, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did

The first-pass of OCL made use of the fact that the OS image has the oc binary so that we would always have the most appropriate version of the binary for a given OpenShift release. Unfortunately, this has the following side effects:

  • Builds take much longer since the base OS image must be pulled twice; and it is a large image.
  • It causes issues with garbage collection on the node after the build pod has been run there.

There were two paths forward to mitigate this:

  1. Install the openshift-clients package into the MCO image.
  2. Extract the oc binary from the base OS image after we've used it.

For 4.19 and 4.20, we can use the first approach. For 4.18, we must use the latter approach:

  • There were SSL verification failures whenever the DNF wrapper script tries to run dnf install openshift-clients. It does not work correctly in the base image we use in 4.18 (registry.ci.openshift.org/ocp/builder:rhel-9-enterprise-base-multi-openshift-4.18) or 4.19 (registry.ci.openshift.org/ocp/builder:rhel-9-enterprise-base-multi-openshift-4.18).
  • The reason we chose these as the base image for development is because they are manifestlisted and have multi-arch support. Since our team has a heterogenous mix of workstations, this is a requirement. Luckily, the newer base images registry.ci.openshift.org/ocp/4.19:base-rhel9 and registry.ci.openshift.org/ocp/4.20:base-rhel9 are manifestlisted and have multi-arch support. Unfortunately, the 4.18 version of this image does not. That said, I have added some conditional logic to handle this case better.

- How to verify it

  1. Opt into OCL and perform a build.
  2. Note which node the build is running on.
  3. Retrieve the base OS image pullspec: $ oc get configmap/machine-config-osimageurl -n openshift-machine-config-operator -o json | jq -r '.data.baseOSContainerImage'
  4. oc debug into the node and run crictl images -o json | grep "<base os image pullspec>". The image should not be found in CRI-O's container image storage.

It is worth noting that the reason why the image should not be found there is because the Buildah pod cannot touch the host's container image storage.

- Description for the changelog
Extract oc binary after OCL build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2025
@cheesesashimi cheesesashimi force-pushed the zzlotnik/use-oc-in-mco-image branch from a62e581 to 8e9db7c Compare September 26, 2025 21:43
@cheesesashimi cheesesashimi changed the title OCPBUGS-57473: extract oc binary instead of pulling image OCPBUGS-57473: extract oc binary instead of pulling OS image Sep 26, 2025
@cheesesashimi cheesesashimi marked this pull request as ready for review September 26, 2025 21:48
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2025
Copy link
Contributor

openshift-ci bot commented Sep 27, 2025

@cheesesashimi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 8e9db7c link true /test e2e-aws-ovn
ci/prow/okd-scos-e2e-aws-ovn 8e9db7c link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-azure-ovn-upgrade-out-of-change 8e9db7c link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-ocl 8e9db7c link false /test e2e-gcp-op-ocl
ci/prow/e2e-aws-mco-disruptive 8e9db7c link false /test e2e-aws-mco-disruptive
ci/prow/e2e-gcp-mco-disruptive 8e9db7c link false /test e2e-gcp-mco-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@isabella-janssen isabella-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and I see that the updated unit tests are still passing, but I'll leave final tagging to a team MCO engineer with more context.

@cheesesashimi
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 30, 2025
@openshift-ci-robot
Copy link
Contributor

@cheesesashimi: This pull request references Jira Issue OCPBUGS-57473, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr September 30, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants