Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Intel TDX confidential computing machine configuration #1426

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bgartzi
Copy link
Contributor

@bgartzi bgartzi commented Feb 14, 2025

/kind feature

What this PR does / why we need it:

In #1410 confidentialCompute was extended from Enabled/Disabled to also supporting explicit AMD SEV and AMD SEV-SNP instance configuration.

TDX is supported in c3 machine types in google cloud [0].

This patch allows users to configure those machines as confidential instances by declaring confidentialCompute: IntelTrustedDomainExtensions.

[0] https://cloud.google.com/confidential-computing/confidential-vm/docs/supported-configurations#all-confidential-vm-instances

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1425

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Support Intel TDX confidential computing instance configuration.

In a6e7d1a confidentialCompute was extended from Enabled/Disabled to
supporting explicit AMD SEV and AMD SEV-SNP configuration.

TDX is supported in c3 machine types in google cloud [0]. This patch
allows the user to add such instances to their clusters by declaring
confidentialCompute=IntelTrustedDomainExtensions.

[0] https://cloud.google.com/confidential-computing/confidential-vm/docs/supported-configurations#all-confidential-vm-instances
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 14, 2025
@k8s-ci-robot k8s-ci-robot requested a review from damdo February 14, 2025 14:27
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bgartzi
Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @bgartzi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2025
Copy link

netlify bot commented Feb 14, 2025

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit 36236cc
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-gcp/deploys/67af52b217a83f00087c7e88
😎 Deploy Preview https://deploy-preview-1426--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks @bgartzi

/lgtm

/assign @cpanato @richardcase

for approval

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2025
@cpanato
Copy link
Member

cpanato commented Feb 18, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 18, 2025
@bgartzi
Copy link
Contributor Author

bgartzi commented Feb 18, 2025

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 18, 2025
@bgartzi
Copy link
Contributor Author

bgartzi commented Feb 18, 2025

@damdo @cpanato There's one gap on this PR that I'm working on closing. confidentialCompute: Enabled wouldn't enable a TDX machine in this case. It would raise an error. confidentialCompute: TDX is needed. I think this is kind of confusing, as why Enabled works for some machine types but not others.

Contrary to #1427, there is not a backward compatibility issue in this case in my opinion. I will provide a fix soon if you all agree.

Thanks for the review, BTW!

@damdo
Copy link
Member

damdo commented Feb 18, 2025

@bgartzi you mean that at the moment with confidentialCompute: Enabled and instanceType: c3 (which is a confidentialMachineSeriesSupportingTdx), you would get an error?

I think that's expected right? Because that was the old default behaviour for confidentialCompute: Enabled, to only enable ConfidentialComputePolicySEV.

In my opinion it is fine for it to stay like that. So we don't dynamically change the confidential policy without the user noticing.
For example if someone in the past would specify confidentialCompute: Enabled (and think they would turn on ConfidentialComputePolicySEV) and then specify c3, which is only a confidentialMachineSeriesSupportingTdx, it would error.
With the change you are proposing this would start working all of a sudden, which might be unexpected. I think it is ok for it to stay like this for historical reasons, and require the user to specify the new policy for instance types that require it.

@bgartzi
Copy link
Contributor Author

bgartzi commented Feb 18, 2025

@damdo, The way I understood it was that confidentialCompute: Enabled enabled confidential computing. Documentation before my previous patch got merged (#1410) didn't state explicitly SEV machines were going to be provisioned

// ConfidentialCompute Defines whether the instance should have confidential compute enabled.
// If enabled OnHostMaintenance is required to be set to "Terminate".
// If omitted, the platform chooses a default, which is subject to change over time, currently that default is false.
// +kubebuilder:validation:Enum=Enabled;Disabled
// +optional

However, google only supported SEV machines when the field was introduced in #809 (see google.golang.org/api@v108). That's, I would say, why nowadays it defaults to SEV machines if just "Enabled" is configured.

Enabling confidentialCompute in c3d machines didn't work neither, but after #1410, confidentialCompute: Enabled on c3d works (It's true it provisions a sev machine, though). So it kind of started to work all of a sudden as well.

It is an issue to do this for SEV-SNP as commented in #1427, because SEV-SNP machines implicitly support SEV in google cloud. I would expect this to also be an issue if some machines could support both SEV and TDX at the same time. Being AMD and Intel CPU's technologies respectively, I'm not sure if that is likely to happen.

A parallel example: I was personally confused when directly using google's compute API and being able to provision TDX machines by combinations such as Enable: False/InstanceType: TDX but getting errors after trying Enable: True on machines supporting TDX: Confidential Instance Config can only be set on supported and compatible Confidential VM machine types. Isn't TDX a confidential computing technology after all?
In this case the result wouldn't be as tragic, as we group the behavior in a single parameter and that would avoid combinations such as the disabled/TDX example. However, I would personally expect confidentialCompute: Enabled to enable; there are alternate ways of configuring specific confidential computing machines.

Regarding current users of confidentialComptute: Enabled. Their supported machine set was limited. With the mentioned change, their machines wouldn't magically turn into TDX. They would have to explicitly change the desired machine type for that to happen, right?

@JoelSpeed
Copy link

In the future, we will likely bump this API to a new version. If we were to mark the Enabled value as deprecated, and add an explicit value for TDX, then in a future release we could translate all use of Enabled to the correct SEV enum value.

This would then create a confidential compute field where there is no "enabled" assumption, and users have to be explicit about the supported type of confidential computing, which makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Intel TDX confidential computing type configuration
6 participants