Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add CRD migrator, deprecate clusterctl upgrade CRD storage version migration #11889

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Feb 21, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #11894

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 21, 2025
@sbueringer sbueringer force-pushed the pr-crd-migration branch 2 times, most recently from 645df3d to 3187ed5 Compare February 24, 2025 09:44
@sbueringer sbueringer changed the title [WIP] ✨ [DNR][POC] Add CRD migrator [WIP] ✨ [DNR][POC] Add CRD migrator, deprecate clusterctl CRD migration Feb 24, 2025
@sbueringer sbueringer changed the title [WIP] ✨ [DNR][POC] Add CRD migrator, deprecate clusterctl CRD migration [WIP] ✨ Add CRD migrator, deprecate clusterctl CRD migration Feb 24, 2025
@sbueringer sbueringer added the area/util Issues or PRs related to utils label Feb 24, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Feb 24, 2025
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

@sbueringer sbueringer changed the title [WIP] ✨ Add CRD migrator, deprecate clusterctl CRD migration ✨ Add CRD migrator, deprecate clusterctl CRD storage version migration Feb 24, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2025
@sbueringer sbueringer changed the title ✨ Add CRD migrator, deprecate clusterctl CRD storage version migration ✨ Add CRD migrator, deprecate clusterctl upgrade CRD storage version migration Feb 24, 2025
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few new files with the license stating 2024 as the year they were introduced, not sure if you wanted to update those, added a nit for one, left the others

@@ -140,6 +144,9 @@ func InitFlags(fs *pflag.FlagSet) {
fs.IntVar(&kubeadmConfigConcurrency, "kubeadmconfig-concurrency", 10,
"Number of kubeadm configs to process simultaneously")

fs.StringVar(&skipCRDMigrationPhases, "skip-crd-migration-phases", "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making this a string slice and allowing it to be specified with multiple values? Would all us to drop the All and mean that if we add a third phase at some point this won't become awkward

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @fabriziopandini As you maybe have an additional opinion on this flag :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly added the All to allow folks to entire disable the CRD migrator without worrying about the phases.

The current flag supports a comma-separated list, so in general All is also not required at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can accept CSV then the string slice part of my comment isn't so important

I can see the argument for All, though, do we want users to be conscious of new phases that we discover and acknowledge that they are skipping them?

I'm not entirely sure why anyone would skip the phases, they are an important part of the lifecycle

Copy link
Member Author

@sbueringer sbueringer Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why anyone would skip the phases, they are an important part of the lifecycle

I think this could be interesting if:

  • folks want to use the storage version migrator from upstream k/k instead of our implementation here
  • have another way to handle migration between apiVersions (also in general thinking about folks that don't run conversion webhooks, even though I don't exactly know how they handle all of this :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok if we want make it a slice
I also think it is acceptable to ask users to be explicit in choosing what they want skip and thus dropping All (we also have only two phases, so it will not be a long list)

Copy link
Member Author

@sbueringer sbueringer Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Made it a slice and dropped All

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet done with the review, but first finding

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one nit and one future idea, otherwise lgtm.

@chrischdi
Copy link
Member

Flake:

  [FAILED] Timed out after 120.001s.
  The function passed to Eventually failed at /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/crdmigration_helpers.go:61 with:
  CustomResourceDefinition clusterclasses.cluster.x-k8s.io should have observed generation annotation with correct value
  Expected
      <map[string]string | len:2>: {
          "controller-gen.kubebuilder.io/version": "v0.17.2",
          "cert-manager.io/inject-ca-from": "capi-system/capi-serving-cert",
      }
  to have {key: value}
      <map[interface {}]interface {} | len:1>: {
          <string>"crd-migration.cluster.x-k8s.io/observed-generation": <string>"3",
      }
  In [It] at: /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/crdmigration_helpers.go:78 @ 02/26/25 20:32:05.997
  Full Stack Trace
    sigs.k8s.io/cluster-api/test/framework.ValidateCRDMigration({0x3108588, 0xc000565ea0}, {0x3120df0, 0xc000039880}, {0xc003bdd470, 0x12}, {0xc00136df20?, 0xc0016a2480?}, 0x2dfd9e8, 0xc0016042d0)
    	/home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/crdmigration_helpers.go:78 +0x153
    sigs.k8s.io/cluster-api/test/e2e.init.func11.1.1({0x3120df0, 0xc000039880}, {0xc003bdd470, 0x12}, {0xc0014e4930, 0x22})
    	/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/clusterctl_upgrade_test.go:103 +0xb7
    sigs.k8s.io/cluster-api/test/e2e.ClusterctlUpgradeSpec.func2()
    	/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/clusterctl_upgrade.go:705 +0x463f

@sbueringer
Copy link
Member Author

Not sure if it's a flake. Had the same error yesterday, improved the error message slightly. Now going to debug it though :)

framework.ValidateCRDMigration(ctx, proxy, namespace, clusterName,
func(crd apiextensionsv1.CustomResourceDefinition) bool {
return strings.HasSuffix(crd.Name, ".cluster.x-k8s.io") &&
crd.Name != "providers.clusterctl.cluster.x-k8s.io" &&
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we want to do with providers. At the moment it doesn't matter because it's v1alpha3 and we don't touch it at all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make it a overwritable func?

But also good enough to fixup once we start using it in a provider e2e

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is already overridable / configurable if a infra provider runs this test :)

What I meant with providers is the providers CRD. Currently I. did not configure any CRDMigrator to migrate the providers CRD (literally the CRD with the name: providers.clusterctl.cluster.x-k8s.io)

@sbueringer
Copy link
Member Author

Okay that was an easy fix

									// ClusterTopology feature is disabled via the CLUSTER_TOPOLOGY variable below,
									// so we can't expect the CRD migrator to migrate the ClusterClass CRD.
									crd.Name != "clusterclasses.cluster.x-k8s.io"

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, the integration test alone is a amazing.

@@ -140,6 +144,9 @@ func InitFlags(fs *pflag.FlagSet) {
fs.IntVar(&kubeadmConfigConcurrency, "kubeadmconfig-concurrency", 10,
"Number of kubeadm configs to process simultaneously")

fs.StringVar(&skipCRDMigrationPhases, "skip-crd-migration-phases", "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok if we want make it a slice
I also think it is acceptable to ask users to be explicit in choosing what they want skip and thus dropping All (we also have only two phases, so it will not be a long list)

@sbueringer
Copy link
Member Author

sbueringer commented Feb 27, 2025

Findings should be all addressed. Thx everyone for the reviews!

I'll run the e2e test locally to see what's going on there

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 27, 2025
@sbueringer
Copy link
Member Author

Should be hopefully green now

/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member Author

@JoelSpeed @fabriziopandini @chrischdi

e2e tests are now also all green. Findings should be addressed, PTAL :)

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to try and condense the CRD rbac by setting the resource name multiple times. It appears the RBAC generator doesn't do any sort of condensing of duplicate rules that only differ by resource name entries

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/util Issues or PRs related to utils cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Removal of old apiVersions in CRDs
5 participants