Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-1010: extract included manifests with net-new capabilities #1958

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Jan 14, 2025

Before this pull, we enabled three 3 net-new capabilities for 4.13 clusters:

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

Now the capabilities for the incoming release is calculated with the function from CVO based on the manifests from the current release and the ones from the incoming release.

To fit the current code that had TarEntryCallback already, the above logic is implemented via a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 14, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2025
Copy link
Contributor

openshift-ci bot commented Jan 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 5 times, most recently from ad75be6 to 38aeb1d Compare January 14, 2025 12:09
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This pull adds a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 2 times, most recently from 69216c5 to 916427e Compare January 14, 2025 12:18
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 14, 2025

@hongkailiu: This pull request references OTA-1010 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Before this pull, we enabled three 3 net-new capabilities for 4.13 clusters:

// FIXME: eventually pull in GetImplicitlyEnabledCapabilities from https://github.com/openshift/cluster-version-operator/blob/86e24d66119a73f50282b66a8d6f2e3518aa0e15/pkg/payload/payload.go#L237-L240 for cases where a minor update would implicitly enable some additional capabilities. For now, 4.13 to 4.14 will always enable MachineAPI, ImageRegistry, etc..
currentVersion := clusterVersion.Status.Desired.Version
matches := regexp.MustCompile(`^(\d+[.]\d+)[.].*`).FindStringSubmatch(currentVersion)
if len(matches) < 2 {
return config, fmt.Errorf("failed to parse major.minor version from ClusterVersion status.desired.version %q", currentVersion)
} else if matches[1] == "4.13" {
build := configv1.ClusterVersionCapability("Build")
deploymentConfig := configv1.ClusterVersionCapability("DeploymentConfig")
imageRegistry := configv1.ClusterVersionCapability("ImageRegistry")
config.Capabilities.EnabledCapabilities = append(config.Capabilities.EnabledCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
config.Capabilities.KnownCapabilities = append(config.Capabilities.KnownCapabilities, configv1.ClusterVersionCapabilityMachineAPI, build, deploymentConfig, imageRegistry)
}

Now the capabilities for the incoming release is calculated with the function from CVO based on the manifests from the current release and the ones from the incoming release.

To fit the current code that had TarEntryCallback already, the above logic is implemented via a ManifestReceiver that works between the upstream TarEntryCallback and the downstream manifestsCallback. With needEnabledCapabilities, it tells the receiver that the manifestsCallback is called with enabled capabilities computed. The price is that manifestsCallback is called only after it collects all the manifests from the upstream.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

manifestInclusionConfiguration.Overrides,
true,
)
// update manifest is enabled, no need to check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, but with IncludeAllowUnknownCapabilities, aren't the ones that fail the ones we can ignore? I'd guess they meant "not even with all these new caps enabled will this manifest get included". The ones we'll implicitly enable caps for are the ones that fail to get included if we apply the current caps, but which do get included if we allow new caps, and which match SameResourceID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is refactored from this one in CVO: https://github.com/openshift/cluster-version-operator/blob/e90705bb5457b2b9d449b4377036c3de6617ebb6/pkg/payload/payload.go#L247-L252

My understanding is very weak on this function (although i have read it a couple of times already).

hmm, but with IncludeAllowUnknownCapabilities, aren't the ones that fail the ones we can ignore?

No. Passed ones (updateManErr == nil in the next line) are ignored. The manifests that are included on the cluster wont generate new implicitly enabled capabilities.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 916427e to 27e03eb Compare January 14, 2025 23:38
@hongkailiu hongkailiu changed the title [wip]OTA-1010: extract included manifests with net-new capabilities OTA-1010: extract included manifests with net-new capabilities Jan 15, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2025
@hongkailiu
Copy link
Member Author

/retest-required

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller February 14, 2025 18:34
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch 4 times, most recently from c1909d5 to 0431ce4 Compare March 4, 2025 16:50
@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

hongkailiu commented Mar 4, 2025

Some testing result from 0431ce4 (outdated)

Cluster-bot:

launch 4.13.12 aws

$ make oc
$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0304 13:36:44.744021   32452 extract_tools.go:1254] Those capabilities become implicitly enabled for the incoming release [ImageRegistry MachineAPI]
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

$ ll credentials-requests
total 48
-rw-r--r--@ 1 hongkliu  staff   1.8K Mar  4 13:36 0000_30_machine-api-operator_00_credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   738B Mar  4 13:36 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
-rw-r--r--@ 1 hongkliu  staff   1.3K Mar  4 13:36 0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   920B Mar  4 13:36 0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   1.0K Mar  4 13:36 0000_50_cluster-network-operator_02-cncc-credentials.yaml
-rw-r--r--@ 1 hongkliu  staff   1.5K Mar  4 13:36 0000_50_cluster-storage-operator_03_credentials_request_aws.yaml

### without --included
$ rm -rf credentials-requests          
$ ./oc adm release extract --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64                                  
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ ll credentials-requests | wc -l
     682

@hongkailiu
Copy link
Member Author

/retest-required

@hongkailiu
Copy link
Member Author

/test okd-scos-e2e-aws-ovn

@hongkailiu
Copy link
Member Author

/test e2e-agnostic-ovn-cmd

@petr-muller
Copy link
Member

petr-muller commented Mar 11, 2025

/uncc

I am not paying attention OTA-1010 matter that much b/c afaik Trevor is involved in this, so I uncc myself to avoid giving false impression that I plan to review here. If my review or approval is necessary, feel free to /cc me again.

@openshift-ci openshift-ci bot removed the request for review from petr-muller March 11, 2025 12:48
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from c1ec179 to 99cda00 Compare March 14, 2025 05:31
@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 99cda00 to def90a6 Compare March 14, 2025 05:57
capSet = capabilitiesSpec.BaselineCapabilitySet
}
deepCopy := clusterVersion.Status.Capabilities.DeepCopy()
if capSet == configv1.ClusterVersionCapabilitySetCurrent {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v4.y could be grow too over time.
But it keeps the consistency with

if data.Capabilities.BaselineCapabilitySet == configv1.ClusterVersionCapabilitySetCurrent {
klog.Infof("If the eventual cluster will not be the same minor version as this %s 'oc', the actual %s capability set may differ.", reportedVersion, data.Capabilities.BaselineCapabilitySet)
}

@hongkailiu
Copy link
Member Author

hongkailiu commented Mar 17, 2025

Rerun the test with def90a6:

launch 4.13.12 aws

The cluster did not set BASELINE_CAPABILITY_SET. So it is the default value vCurrent.

$ oc get clusterversions.config.openshift.io version -o yaml | yq .spec
{
  "channel": "candidate-4.13",
  "clusterID": "01541c49-a5d6-4b02-ba6a-161a12c3ca79",
  "upstream": "https://api.integration.openshift.com/api/upgrades_info/graph"
}

$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64

I0314 09:39:57.809034   23241 extract_tools.go:1340] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the actual vCurrent capability set may differ.
I0314 09:39:57.809062   23241 extract_tools.go:1343] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the known capability sets may differ.
I0314 09:40:11.867239   23241 extract_tools.go:1253] Those capabilities become implicitly enabled for the incoming release []
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

$ ll credentials-requests
total 48
-rw-r--r--@ 1 hongkliu  staff   1.8K Mar 14 09:40 0000_30_machine-api-operator_00_credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   738B Mar 14 09:40 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
-rw-r--r--@ 1 hongkliu  staff   1.3K Mar 14 09:40 0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   920B Mar 14 09:40 0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
-rw-r--r--@ 1 hongkliu  staff   1.0K Mar 14 09:40 0000_50_cluster-network-operator_02-cncc-credentials.yaml
-rw-r--r--@ 1 hongkliu  staff   1.5K Mar 14 09:40 0000_50_cluster-storage-operator_03_credentials_request_aws.yaml

Comparing with #1958 (comment), no caps became implicitly enabled as expected. Because they would be (explicitly) enabled with BASELINE_CAPABILITY_SET=vCurrent.

I wanted to try BASELINE_CAPABILITY_SET=None with

launch 4.13.12 aws,no-capabilities

But cluster-bot is not so happy with the command.

I expect to see some implicitly enabled caps there.


Update on May 20.
Thanks to @jiajliu for the magic that creates a 4.13.12 cluster with baselineCapabilitySet=None
Create openshift/release#63030 then trigger the rehearsal by

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-multi-nightly-4.14-upgrade-from-stable-4.13-aws-upi-basecap-none-amd-f28

When the rehearsal job came to

INFO[2025-03-20T16:04:13Z] Running step aws-upi-basecap-none-amd-f28-wait. 

Then login to the build farm to get the kubeconfig to the ephemeral cluster:

$ oc -n ci-op-76cqklsf extract secret/aws-upi-basecap-none-amd-f28 --to=- --keys kubeconfig > ~/.kube/config

Repeat the above test:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.12   True        False         34m     Cluster version is 4.13.12

$ oc get clusterversions.config.openshift.io version -o yaml | yq .spec
{
  "capabilities": {
    "baselineCapabilitySet": "None"
  },
  "clusterID": "a6bbd42d-eb66-44b6-89de-ca2be2985fd5"
}

$ ./oc adm release extract --included --credentials-requests --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0320 12:26:25.194094    7394 extract_tools.go:1343] If the eventual cluster will not be the same minor version as this v4.2.0-alpha.0-2583-gdef90a6 'oc', the known capability sets may differ.
I0320 12:27:04.242248    7394 extract_tools.go:1253] Those capabilities become implicitly enabled for the incoming release [ImageRegistry MachineAPI]
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

$ rg ImageRegistry credentials-requests
credentials-requests/0000_50_cluster-image-registry-operator_01-registry-credentials-request.yaml
6:    capability.openshift.io/name: ImageRegistry

The logs look good to me.

@hongkailiu
Copy link
Member Author

/test e2e-agnostic-ovn-cmd

Copy link
Contributor

openshift-ci bot commented Mar 17, 2025

@hongkailiu: No presubmit jobs available for openshift/oc@master

In response to this:

/test e2e-agnostic-ovn-cmd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hongkailiu
Copy link
Member Author

/retest-required

klog.Infof("If the eventual cluster will not be the same minor version as this %s 'oc', the actual %s capability set may differ.", reportedVersion, capSet)
}
deepCopy.EnabledCapabilities = append(deepCopy.EnabledCapabilities, configv1.ClusterVersionCapabilitySets[capSet]...)
klog.Infof("If the eventual cluster will not be the same minor version as this %s 'oc', the known capability sets may differ.", reportedVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this looks like it will be an unnecessary echo in the capSet == configv1.ClusterVersionCapabilitySetCurrent case, where we've just logged a very similar message above. Can we put this log line into an else branch, or something? We may also have a way to figure out the target version from the pullspec we're extracting, and then be able to know instead of guess the if?

Copy link
Member Author

@hongkailiu hongkailiu Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this log line into an else branch, or something?

Done.

We may also have a way to figure out the target version from the pullspec we're extracting

Left a TODO for this.


Update:

Done in the 88d0075

$ /oc -v 2 adm release extract --to 414 quay.io/openshift-release-dev/ocp-release:4.14.0-rc.0-x86_64
I0321 10:31:15.726235   22094 extract.go:365] Retrieved the version from image configuration in the image to extract: 4.14.0-rc.0
Extracted release payload from digest sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe created at 2023-09-07T07:37:47Z

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot use the existing ImageMetadataCallback

if o.ImageMetadataCallback != nil {
o.ImageMetadataCallback(&mapping, location.Manifest, contentDigest, imageConfig, location.ManifestListDigest())
}

because it is called too late.

@hongkailiu hongkailiu force-pushed the OTA-1010-refactor-callback branch from 0be8d41 to 88d0075 Compare March 21, 2025 14:33
@hongkailiu hongkailiu requested a review from wking March 21, 2025 14:53
Copy link
Contributor

openshift-ci bot commented Mar 21, 2025

@hongkailiu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 88d0075 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants