-
Notifications
You must be signed in to change notification settings - Fork 81
Add snapshot restore validator #1191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jiaqiluo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits.
ba25be2 to
88b2c50
Compare
jakefhyde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 nit
|
|
||
| // parseSnapshotClusterSpec decodes snapshot.SnapshotFile.Metadata into a v1.ClusterSpec. | ||
| // The metadata is stored as a nested, gzipped, base64-encoded structure. | ||
| func parseSnapshotClusterSpec(snap *rkev1.ETCDSnapshot) (*v1.ClusterSpec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We copy this from Rancher correct? I wonder if we could move this to github.com/rancher/rancher/pkg/apis/provisioning.cattle.io/v1 somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍
045569e to
3863e06
Compare
jakefhyde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| if err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| return admission.ResponseBadRequest( | ||
| fmt.Sprintf("etcd restore references missing snapshot %s in namespace %s", newRestore.Name, newCluster.Namespace)), nil | ||
| } | ||
| return nil, fmt.Errorf("failed to get etcd snapshot %s/%s: %w", newCluster.Namespace, newRestore.Name, err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if err != nil { | |
| if apierrors.IsNotFound(err) { | |
| return admission.ResponseBadRequest( | |
| fmt.Sprintf("etcd restore references missing snapshot %s in namespace %s", newRestore.Name, newCluster.Namespace)), nil | |
| } | |
| return nil, fmt.Errorf("failed to get etcd snapshot %s/%s: %w", newCluster.Namespace, newRestore.Name, err) | |
| } | |
| if apierrors.IsNotFound(err) { | |
| return admission.ResponseBadRequest( | |
| fmt.Sprintf("etcd restore references missing snapshot %s in namespace %s", newRestore.Name, newCluster.Namespace)), nil | |
| } else if err != nil | |
| return nil, fmt.Errorf("failed to get etcd snapshot %s/%s: %w", newCluster.Namespace, newRestore.Name, err) | |
| } |
apierrors.IsNotFound Does a nil check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍
63f3b7a to
6782d52
Compare
9a9828d to
d596ab9
Compare
| // validateETCDSnapshotRestore ensures that any requested ETCD restore | ||
| // (a) references an existing ETCDSnapshot, and | ||
| // (b) contains decodable metadata with a valid "provisioning-cluster-spec". | ||
| func (p *provisioningAdmitter) validateETCDSnapshotRestore(request *admission.Request, oldCluster, newCluster *v1.Cluster) (*admissionv1.AdmissionResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really wanted to get pedantic, we could validate that it isn't set on create, but I say we save that for a later date.
crobby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validator changes seem good. Not sure about the golang/lint bumps as part of this PR though. Should they be separate?
| # Build Stage | ||
| # =============== | ||
| FROM --platform=$BUILDPLATFORM registry.suse.com/bci/golang:1.24 AS build | ||
| FROM --platform=$BUILDPLATFORM registry.suse.com/bci/golang:1.25 AS build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a bad change, but this change doesn't appear to be related to adding a new validator. Should it be a separate issue/PR?
| # =============== | ||
| FROM build AS validate | ||
| RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b /usr/local/bin v1.64.8 | ||
| RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b /usr/local/bin v2.7.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a bad change, but this change doesn't appear to be related to adding a new validator. Should it be a separate issue/PR?
| module github.com/rancher/webhook | ||
|
|
||
| go 1.24.0 | ||
| go 1.25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a bad change, but this change doesn't appear to be related to adding a new validator. Should it be a separate issue/PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to mention this in the PR description.
The Go version bump is here because rancher/rancher main is already on this Go version, and I needed to validate this webhook change together with my rancher/rancher updates to the snapshotbackpopulate controller. With the older Go version in this repo, I was running into toolchain mismatches when testing the full flow.
After bumping Go, the current golangci-lint version started failing to run with the new Go toolchain, so I updated it as well to keep CI working.
To keep scope clean, I can open a dedicated PR like "Bump Go toolchain + golangci-lint", merge that first, and then rebase this validator PR on top of it (so this PR stays focused on the validator logic).
Let me know what you think 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jferrazbr I agree that it would be cleaner to do the go toolchain + lint bump in a separate PR (similar to how the other repos have been trickling in go 1.25).
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
f8fe15f to
7b88f4b
Compare
7b88f4b to
7634303
Compare
Issue: rancher/rancher#52574
Problem
When restoring from an ETCD snapshot, the webhook did not validate the snapshot metadata before accepting
spec.rkeConfig.etcdSnapshotRestore.It was possible to request
"kubernetesVersion"or"all"forrestoreRKEConfigeven when the referenced snapshot had missing or invalid metadata.This led to restore requests that passed admission but failed later in the restore flow with parse errors.
Solution
This PR adds a validator for
spec.rkeConfig.etcdSnapshotRestoreonprovisioning.cattle.io/v1, Clusterand wires the RKE client into the webhookClientsstruct.The validator:
etcdSnapshotRestorechanges from empty to a new non empty value, so it does not block unrelated cluster updates.etcdSnapshotRestore.nameexists in the same namespace.etcdSnapshotRestore.restoreRKEConfigis one of"none","kubernetesVersion", or"all"."kubernetesVersion", requires akubernetesVersion, and for"all", requires bothkubernetesVersionandrkeConfig.In addition:
pkg/server/handlers.gowas moved to a management cluster only list so that validation only runs where snapshot resources exist (local/management cluster). This avoids issues on downstream clusters that do not have the snapshot resources.Docs are updated to describe the new validation behavior, and unit tests cover the main success and failure paths.
This partially addresses the linked issue by validating snapshot metadata before restore. The annotation based mode filtering will be handled in a follow up change.
CheckList