Skip to content

Stop template resolving if not needed #1943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

koflanx
Copy link

@koflanx koflanx commented Jun 26, 2025

What this PR does / why we need it:
This PR allows users to rotate their anexia nodes even if the OS template they used before has since been removed. This might happen due to security patches or similar.
Avoiding the template ID resolving is needed, because otherwise the teardown of existing machines can fail if the named template does no longer exist. This otherwise leads to node rotation issues.

Which issue(s) this PR fixes:
Not reported as an issue outside of anexia.

What type of PR is this?
/kind bug

Special notes for your reviewer:
This is only related to the cloud provider anexia.

Does this PR introduce a user-facing change? Then add your Release Note here:

NONE

Documentation:

NONE

Avoiding the template ID resolving is needed, because otherwise the
teardown of existing machines can fail if the named template does no
longer exist. This otherwise leads to node rotation issues.

Signed-off-by: Kim Fehrs <[email protected]>
@kubermatic-bot kubermatic-bot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. docs/none Denotes a PR that doesn't need documentation (changes). dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. labels Jun 26, 2025
@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign moadqassem for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot kubermatic-bot added sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 26, 2025
@kubermatic-bot
Copy link
Contributor

Hi @koflanx. Thanks for your PR.

I'm waiting for a kubermatic member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubermatic-bot kubermatic-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 26, 2025
koflanx added a commit to anexia-it/machine-controller that referenced this pull request Jul 7, 2025
Upstream PRs
use virtio as nic kubermatic#1942
stop template resolving kubermatic#1943

Originally implemented in
- 11e2b3c
- 07f3827
koflanx added a commit to anexia-it/machine-controller that referenced this pull request Jul 7, 2025
Upstream PRs
use virtio as nic kubermatic#1942
stop template resolving kubermatic#1943

Originally implemented in
- 11e2b3c
- 07f3827

Stop template resolving if not needed
Avoiding the template ID resolving is needed, because otherwise the
teardown of existing machines can fail if the named template does not
longer exist. This happened as part of the Flatcar rollout (VSO-2422)
and led to node rotation issues.

Closes: ANXKUBE-1361

Refactor provisioning handling of VMs
Checking the returned errors alone does not match the behaviour of
the API, which keeps the errors field, even if later runs of the
provisioning task we're successful later.

Because of that, the machine controller was not able to delete machines
that were created, but not yet provisioned as nodes. By evaluating the
"status" field instead of the presence of errors, we can (hopefully)
better verify the provisioning status of a VM.

Furthemore, the default NIC type got changed to "virtio", because of
breaking API changes introduced and not yet reverted (VSSUP-16).

Unfortunately, in order to work with the newer go-anxcloud release, the
Go version in the Dockerfile had to be bumped as well.

Tested on my playground cluster, where the provisioning of new machines
is working.

Closes: ANXKUBE-1326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/none Denotes a PR that doesn't need documentation (changes). kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants