Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: validate docker image references in upgrade options #10387

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

azalio
Copy link

@azalio azalio commented Feb 17, 2025

What? (description)

This commit adds validation for Docker image references in the UpgradeOptions struct. The validation ensures that all image fields (kubelet, apiserver, controller-manager, scheduler, proxy) are valid Docker image references before proceeding with the upgrade process.

The implementation:

  • Uses the distribution/reference package for robust Docker image validation
  • Validates all image fields in UpgradeOptions struct
  • Performs validation before starting the upgrade process
  • Provides clear error messages indicating which image field is invalid
  • Includes comprehensive tests covering various image reference formats
  • Supports backward compatibility

This change helps prevent potential issues during the upgrade process by catching invalid image references early with clear error messages.

Why? (reasoning)

When working with the library, I broke my Kubernetes cluster.

Upgrading Kubernetes to version v1.32.1
discovered controlplane nodes ["10.10.10.10"]
updating "kube-apiserver" to version "1.32.1"
 > "10.10.10.10": starting update
 > update kube-apiserver: v1.32.0 -> 1.32.1
 > "10.10.10.10": machine configuration patched
 > "10.10.10.10": waiting for kube-apiserver pod update
 > "10.10.10.10": kube-apiserver: waiting, config version mismatch: got "12", expected "13"
^Csignal: interrupt

talosctl --talosconfig=./talosconfig  --nodes=10.10.10.10 logs kubelet  | grep kube-apiserver | tail -1
10.10.10.10: {"ts":1739781513171.0198,"caller":"kubelet/pod_workers.go:1301","msg":"Error syncing pod, skipping","pod":{"name":"kube-apiserver-azalio-talos-cp1","namespace":"kube-system"},"podUID":"c0185797c1b64406a14ccd9c6585cd36","err":"failed to \"StartContainer\" for \"kube-apiserver\" with InvalidImageName: \"Failed to apply default image tag \\\":v1.32.1\\\": couldn't parse image name \\\":v1.32.1\\\": invalid reference format\"","errCauses":[{"error":"failed to \"StartContainer\" for \"kube-apiserver\" with InvalidImageName: \"Failed to apply default image tag \\\":v1.32.1\\\": couldn't parse image name \\\":v1.32.1\\\": invalid reference format\""}]}

I started to investigate the issue and realized that the Upgrade function was not checking if all the necessary parameters had been submitted.
For instance, it turned out that the image being downloaded was :v1.31.1, which is clearly the wrong version to download.

This caused the Kubernetes cluster to become unstable. I have created a patch to validate these options and ensure that they are correct.

I have written a simple example to demonstrate this behavior.
https://gist.github.com/azalio/4017548d95cc9b75ae2e8c9725fccca6

Acceptance

Please use the following checklist:

  • you included tests (if applicable)
  • you ran conformance (make conformance)
commit         GPG Identity                 FAILED        openpgp: signature made by unknown entity
  • you formatted your code (make fmt)

I have launched, but these are not my own changes.

	modified:   api/resource/definitions/block/block.proto
	modified:   api/resource/definitions/cluster/cluster.proto
	modified:   api/resource/definitions/cri/cri.proto
	modified:   api/resource/definitions/enums/enums.proto
	modified:   api/resource/definitions/etcd/etcd.proto
	modified:   api/resource/definitions/extensions/extensions.proto
	modified:   api/resource/definitions/files/files.proto
	modified:   api/resource/definitions/hardware/hardware.proto
	modified:   api/resource/definitions/k8s/k8s.proto
	modified:   api/resource/definitions/kubeaccess/kubeaccess.proto
	modified:   api/resource/definitions/kubespan/kubespan.proto
	modified:   api/resource/definitions/network/network.proto
	modified:   api/resource/definitions/perf/perf.proto
	modified:   api/resource/definitions/proto/proto.proto
	modified:   api/resource/definitions/runtime/runtime.proto
	modified:   api/resource/definitions/secrets/secrets.proto
	modified:   api/resource/definitions/siderolink/siderolink.proto
	modified:   api/resource/definitions/time/time.proto
	modified:   api/resource/definitions/v1alpha1/v1alpha1.proto
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

See make help for a description of the available targets.

This commit adds validation for Docker image references in the UpgradeOptions
struct. The validation ensures that all image fields (kubelet, apiserver,
controller-manager, scheduler, proxy) are valid Docker image references before
proceeding with the upgrade process.

The implementation:
- Uses the distribution/reference package for robust Docker image validation
- Validates all image fields in UpgradeOptions struct
- Performs validation before starting the upgrade process
- Provides clear error messages indicating which image field is invalid
- Includes comprehensive tests covering various image reference formats

This change helps prevent potential issues during the upgrade process by
catching invalid image references early with clear error messages.

Signed-off-by: Mikhail Petrov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant