Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add compatibility versions feature gate test as a nightly prow periodic job #34257

Merged

Conversation

aaron-prindle
Copy link
Contributor

@aaron-prindle aaron-prindle commented Feb 3, 2025

This PR adds a feature get test to the compatibility version prow periodic jobs. This test is adapted from the closed PR here: kubernetes/kubernetes#128138 which attempted a similar test upstream (it was decided test-infra was the better place for the test in the discussion there). This feature gate test builds and runs the latest k8s, hits the /metrics endpoint fetching the current feature gates and their information (stage, lockToDefault, enabled/disabled). Then it pulls the n-1 k8s repo and grabs the versioned_feature_list.yaml and unversioned_feature_list.yaml files from that repo which are the source of truth for n-1 feature gates and compares them making sure that the feature gates for n vs n-1 fall within the necessary guarantees for the compatibility versions feature.

Example test run by running the script locally w/ same env var config as the prow test:
https://gist.github.com/aaprindle/254f748d008afc11063ba3479f7238ff

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 3, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Feb 3, 2025
@aaron-prindle
Copy link
Contributor Author

/assign @BenTheElder

@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch 2 times, most recently from 05e2216 to 3d5d4f7 Compare February 3, 2025 19:12
@aaron-prindle
Copy link
Contributor Author

/cc @Jefftree

@k8s-ci-robot k8s-ci-robot requested a review from Jefftree February 3, 2025 20:54
@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch from 3d5d4f7 to 7c7effc Compare February 3, 2025 20:56
Copy link
Member

@Jefftree Jefftree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question about test-infra changes: Is it possible to see a sample run of this before merging or is that not possible?

@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch from 7c7effc to 28998c5 Compare February 3, 2025 21:27
@aaron-prindle
Copy link
Contributor Author

General question about test-infra changes: Is it possible to see a sample run of this before merging or is that not possible?

I don't believe it is possible, I have been testing the scripts manually. I have attached a gist of a test output to the PR description.

@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch 2 times, most recently from 35f6a6c to b79b493 Compare February 3, 2025 22:33
@BenTheElder
Copy link
Member

General question about test-infra changes: Is it possible to see a sample run of this before merging or is that not possible?

possible, but a big pain (you have to setup everything yourself, see e.g. pj-on-kind.sh, which is going to have problems with e2e tests)

... and prow has ~no investment right now, just barely community-provided KTLO

- wrapper.sh
- bash
- -c
- curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" && ./../test-infra/experiment/compatibility-versions/compatibility-versions-feature-gate-test.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref: #33980

Copy link
Contributor Author

@aaron-prindle aaron-prindle Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies is the recommendation here related to?
#33594

From reading over #33980 I'm a bit confused what the actionable change would be to make to the script here is, can you elaborate a bit on how I might use the precompiled binaries here? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I'm just trying to track these for later, it's not a blocker.

Basically you use kind build node-image with one of the CI builds of Kubernetes instead of from source.
We will also have to make sure the commit from that build gets recorded to testgrid (I forget the specifics of how that works), if / when we convert one of the scripts we can apply it in multiple places.

But it's not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense, thanks. I'll track this work through #33594, added a reference there to #33980. I'll work on a separate PR converting all of the compatibility-versions pieces to use prebuilt binaries when I tackle #33594

kubectl get --raw /metrics > "${output_file}"
}

validate_feature_gates() {
Copy link
Member

@BenTheElder BenTheElder Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would make sense to factor out this part into a test script independent of all the e2e setup bits, at some point we may more directly share the rest? Also it seems like most (all?) of this [test] would work on other clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored validate_feature_gates into its own script now:
experiment/compatibility-versions/validate-compatibility-versions-feature-gates.sh

which is called from the main/entrypoint script:
experiment/compatibility-versions/compatibility-versions-feature-gate-test.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I could see us hoisting this out or even rewriting it in go later for use by more jobs (maybe into k/k and run against GCE clusters as well?) :-)

@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch 6 times, most recently from 182200d to a7cca0e Compare February 5, 2025 17:47
@BenTheElder
Copy link
Member

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2025
runtimeConfig: ${runtime_config}
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also going to have to start targeting these to specific versions and add a kubeadm v1alpha4 patch soon: kubernetes-sigs/kind#3847

Going to be a headache, at least trying to track the places in-project where we're running kind @ HEAD

touch "${results_file}"

# Parse /metrics -> actual_features[featureName] = 0 or 1
declare -A actual_features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting: when we have a lot of arrays that aren't something trivial like exec args that's a sign that we probably want to reach for a more structured language

Copy link
Contributor Author

@aaron-prindle aaron-prindle Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I originally wrote this test as an integration test for k8s/k8s in Golang but ported it to bash for use in test-infra after some discussion on the original PR:
kubernetes/kubernetes#128138

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, I commented there to continue the conversation 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and I think for now bash is reasonable, but something to be wary of longterm, especially if we know we're going to need more functionality)

@@ -0,0 +1,259 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a follow-up but we usually use #!/usr/bin/env bash, this guarantees that we use the preferred copy in PATH and is a little more portable (/usr/bin/env is virtually guaranteed due to env being POSIX and so many scripts doing this, but bash may not be in /bin, e.g. on my work mac I have installed newer bash via homebrew and kubernetes scripts may require that)

same goes for the other script.

https://en.wikipedia.org/wiki/Shebang_(Unix)#Portability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this, I've updated all scripts to use:

#!/usr/bin/env bash

done
}

# --- Main execution when script is called directly ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend just only supporting calling and condense this down to a single codepath / exec it from the other script instead of trying to support both exec and sourcing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(up to you, but I'd suggest considering this before it gets wider usage, feel free to /hold cancel and punt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense, I've updated the script to support only being exec'd for now.

@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch from a7cca0e to 0ebca9f Compare February 5, 2025 22:39
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2025
@BenTheElder
Copy link
Member

/hold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aaron-prindle, BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 5, 2025
@aaron-prindle aaron-prindle force-pushed the compat-versions-ft-gate-test branch from 0ebca9f to 2220b0b Compare February 5, 2025 23:20
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2025
@aaron-prindle
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2025
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2025
@k8s-ci-robot k8s-ci-robot merged commit f414bfa into kubernetes:master Feb 5, 2025
6 of 7 checks passed
@k8s-ci-robot
Copy link
Contributor

@aaron-prindle: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key compatibility-versions-e2e.yaml using file config/jobs/kubernetes/sig-testing/compatibility-versions-e2e.yaml

In response to this:

This PR adds a feature get test to the compatibility version prow periodic jobs. This test is adapted from the closed PR here: kubernetes/kubernetes#128138 which attempted a similar test upstream (it was decided test-infra was the better place for the test in the discussion there). This feature gate test builds and runs the latest k8s, hits the /metrics endpoint fetching the current feature gates and their information (stage, lockToDefault, enabled/disabled). Then it pulls the n-1 k8s repo and grabs the versioned_feature_list.yaml and unversioned_feature_list.yaml files from that repo which are the source of truth for n-1 feature gates and compares them making sure that the feature gates for n vs n-1 fall within the necessary guarantees for the compatibility versions feature.

Example test run by running the script locally w/ same env var config as the prow test:
https://gist.github.com/aaprindle/254f748d008afc11063ba3479f7238ff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants