-
Notifications
You must be signed in to change notification settings - Fork 535
Enhancement proposal for Confidential Clusters #1878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @uril. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
c213033 to
677a330
Compare
|
/retest |
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work overall! There's of course huge amounts of detail in some of this, but I think the outline looks good.
| instance | ||
|
|
||
| * RHEL CoreOS | ||
| * Support verifying the integrity of the disk content during re-provisioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to boot in a pure stateless mode here, where we're not accessing any persistent storage for /etc and /var right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the larger concern that we can not trust the filesystem itself on the disk on first boot, we need some form of integrity verification that covers the entire partition. I could be implemented in a similar fashion to what is done with Secure Execution on S390x.
If we don't take this concern into account, then we could indeed only read the fs-verity verified content from the composefs repo and re-generate the /etc & /var content from it.
| * Measure Ignition config in a PCR value, before parsing it | ||
|
|
||
| * Machine Config Operator | ||
| * Ensure that MachineConfigs are only served to attested nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still really hope that we get away from having a MCS at all by shrinking the role of Ignition such that everything needed to join fits into the bootstrap config which really really should be able to fit in e.g. AWS instance user-data store completely and the like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still really hope that we get away from having a MCS at all by shrinking the role of Ignition
What are the big gripes with MCS that make you say this?
instance user-data store completely
On AWS this is 16KiB right? I have no scope on how big we currently sit at, perhaps we should do some analysis on this and where the biggest wins would be in scoping this down as part of this EP process?
| node, which is considered trusted and it is used to bootstrap the trust for the | ||
| rest of the cluster. | ||
|
|
||
| In phase 2, the bootstrap node itself must be attested to establish trust. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but won't most people who want to do this actually want HCP anyways? I would definitely put HCP support far in front of this as a priority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be an option. HCP deployments place the trust in the cluster hosting the control plane, so for it to make sense for Confidential Clusters, it would be a configuration with a Hosted Control Plane in a trusted environment (likely a Bare Metal cluster) and HCP Workers in a cloud.
If we want everything in the cloud then we are back to the standalone cluster case for the control plane part as you can not claim that your workers are confidential if the control plane is hosted on the same cloud with non confidential VMs.
| have been pre-computed and stored, or pull the container image itself and | ||
| directly compute the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards pull, but have a cache of course of container-sha ➡️ PCRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, pull is the preferred option, if available (image is labeled with 'org.coreos.pcrs')
677a330 to
ff97116
Compare
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments inline
|
|
||
| ## Proposal | ||
|
|
||
| Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question: the CVM is the entire node right? You can't say, run 2 CVMs on one machine, or run things outside of the CVM on that machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in our case, OpenShift node == CVM == OpenShift machine.
The "host machine" (the cloud server) can run many CVMs (and other things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the entire node runs as a Confidential VM provided by the cloud provider. You don't control on which host your VM runs (it's a cloud), and you can not run things outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On which platforms are CVMs a thing? Does this feature make sense in on-prem environments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we focus public clouds. Azure, GCP and AWS offer CVMs.
For on-prem, there is a need for something like OpenShift Virtualization to create/manage the CVMs.
Use-case for on-prem can be:
- Have different teams/departments in a company run workload on their own confidential cluster.
- Prevent hosts admins from looking at workloads with confidential data
| components: | ||
|
|
||
| * OpenShift API | ||
| * Allow nodes to be marked as confidential. This is specific per cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a clarification here, this will be a cluster level setting? Or are you proposing that in one cluster, some nodes can be CVMs and others not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the nodes of a confidential cluster are CVMs.
The specific configuration/API, for requesting cloud providers to create a CVM is platform dependent and is not kept at the cluster level.
A mixed cluster of non/confidential nodes is technically possible, but is not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be cluster wide. A cluster will be either all confidential nodes or not at all. Technically you can mix things, but it does not make sense for a cluster running in a cloud to be mixed.
| reference-values (expected "correct" values) in Trustee. | ||
|
|
||
| * RHEL CoreOS | ||
| * Add support for composefs (native), UKI, and systemd-boot to bootc (Bootable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the MCO and on-cluster RHCOS operations doesn't currently use bootc at all, would that integration be needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will indeed need "direct" bootc support in the MCO (i.e. not use rpm-ostree at all anymore).
| (cloud provider specific). | ||
| * Deploy the Confidential Cluster Operator on the bootstrap node | ||
|
|
||
| * Confidential Cluster Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be running as a core payload operator that's always present, or only deployed conditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be an operator part of the core payload but only running if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some discussion of the Rust language situation discussed on the arch call a few weeks back, added to the enhancement, especially given the plan for it to be in the core payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a section outlining the inclusion in core payload and why would be great, important decisions to be made here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please look at CVO capabilities. I believe this will want to be behind a new capability that is not included by default, but only included in clusters where the installer is flagged to include it
By adding it as a new opt-in capability it means that even though it's in the payload, it won't suddenly appear on existing clusters during upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add something about Rust, core payload and CVO capability to this enhancement proposal.
| installer, passing in the URL of the external Trustee instance chosen above. | ||
| 1. The OpenShift installer generates a set of configuration files for the | ||
| external Trustee instance. | ||
| 1. If the cluster creator adds/removes/modifies MachineConfigs, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify on this point? The admin shouldn't be able to "modify configs on the fly" during installation. The MCO has a singular render generation phase, if that's what you're trying to fetch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the config that will be used for the external Trustee server will include the full config passed to the bootstrap node. If anything modifies this config, then the Trustee config will have to be re-generated. We don't expect the manifests to be modified live during the installation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the later steps of the installer process is to generate the ignition, which spits out two ignitions, one is the stub for the control plane, the other is the full bootstrap ignition. There should be no changes to the bootstrap ignition after this point generally.
I assume any information that the trustee needs the node to know is added to the ignition before this step right? Once the trustee gets the ignition it doesn't generate anything that needs to modify what the node needs to know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration files are for the remote Trustee (used to attest the bootstrap node), not for the bootstrap node itself. For example if Trustee is to verify PCR7, e.g. the secure-boot key database, Trustee needs to have a policy that checks PCR7 and a "good" value to compare the runtime measurement against.
(I'm not sure that answers your question)
| This enhancement introduces some new API extensions: | ||
|
|
||
| * **Running nodes on cloud CVMs**: | ||
| For each supported cloud provider, confidential computing types and code need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some examples for this? Just curious what that would look like in practice.
Also curious if this affects the ongoing MAPI->CAPI transition at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all clouds will be supported? What clouds are we targeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We first target Azure, GCP and AWS, with AMD SEV-SNP and Intel TDX.
Azure: https://learn.microsoft.com/en-us/azure/confidential-computing/quick-create-confidential-vm-azure-cli
GCP: https://docs.cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snp-work-launch.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, for example in the installer:
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_azureprovider.go
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_gcpprovider.go
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to MAPI/CAPI - I think both should be able to request the cloud to run a CVM
| In phase 2, the initial configuration will be modified to tell Ignition to fetch | ||
| the new configuration from a remotely attested resource endpoint. The MCS will | ||
| not serve Ignition configs directly for nodes anymore but will store those as | ||
| resources in a Trustee instance. To access those configurations, the node will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the trustee instance responsible for asking the MCS for the contents through some in-cluster proxy, or would we have to have the MCS initiate that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the 2-phase attestation proposal: trusted-execution-clusters/operator#68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Trustee instance is being managed by the confidential-clusters-operator.
The operator is the one that would be asking MCS ignition configuration for new nodes and upload them as a secret to Trustee.
Alternatives are 1. The node gets encrypted ignition configs and request the key to decrypt them from Trustee. 2. The node requests a client-certificate and MCS only serves clients with a valid certificate.
I think in both cases, cocl-operator needs to talk with MCS and update Trustee, such that the information is available for the node only after a successful attestation.
ff97116 to
766b9a1
Compare
| As part of the cluster installation process in cloud platforms, a bootstrap node | ||
| is created, which hosts a temporary control plane used to create the final | ||
| control plane and worker nodes of the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another part of the installation process we should consider, where the installer:
- generates the ignition file for the bootstrap node
- uploads bootstrap ignition to a cloud storage bucket
- puts pointer ignition in bootstrap node userdata redirecting bootstrap node to pull ignition from storage bucket (using a self-signed URL, although for Azure I think self-signed URL support is WIP and azure uses storage account keys)
It's unclear whether this model will continue to work with the remote attestation service. If the First Boot configuration from the attestation service can be merged alongside the bootstrap ignition bucket, then it would require fewer changes to the installer. For example (pseudo), the bootstrap pointer ignition would be injected with the additional attestation server source:
{
"ignition": {
"config": {
"merge": [
{
"source": "http://<registration-service>/ignition"
},
{
"source": "http://<cloud-bucket>/ignition"
}
]
}
}
}This would utilize ignition's merge functionality to grab the configs from both the attestation server and the cloud bucket.
But if it's a requirement that the remote attestation server is contacted first (or only) then the installer would presumably need to be updated to upload bootstrap ignition so it could be served by the attestation service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote attestation server is contacted first
Does ignition support multiple phases of pulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we have tested, yes it supported at least 2. Ours and the MCO one
|
We (Alice & Jakob) wrote a blog post summarizing part of this enhancement: https://developers.redhat.com/articles/2025/11/26/trusted-execution-clusters-operator-design-and-flow-overview# |
| This enhancement proposes the integration of **confidential computing** | ||
| capabilities into **OpenShift cluster**, enabling the deployment of | ||
| **Confidential Clusters**. A confidential cluster is an OpenShift cluster where | ||
| all nodes run on Confidential Virtual Machines (CVMs) and are remotely attested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from attestation, what else makes a VM a CVM? Might be worth adding a glossary to explain what a CVM is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A CVM is a virtual-machine that run on confidential-computing enabled hardware, such as AMD SEV-SNP and Intel TDX.
I'll add an introduction to Confidential Computing.
| other unauthorized entities. | ||
|
|
||
| * Implement a robust remote attestation process for CVM nodes to verify their | ||
| trustworthiness before sharing secrets and joining the cluster. The remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our ignition process today shares secrets for joining the cluster without any authentication. This implies to me that we want to move secret sharing to after attestation has happened? Have you considered the existing secrets in the ignition flow and how they might be moved to be fetched later?
Would having an authenticated ignition server be an option to mitigate this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoelSpeed yes, we are thinking of releasing ignition config after a successful attestation. This will be covered later since it is a complex design, but you can see the draft in the 2-phase attestation proposal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested solution for protecting secrets within ignition is having the ignition configuration fetched directly from Trustee upon a successful configuration.
Alternatively, client-side certificates can be used, or perhaps encrypted ignition configurations.
|
|
||
| ## Proposal | ||
|
|
||
| Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On which platforms are CVMs a thing? Does this feature make sense in on-prem environments?
| components: | ||
|
|
||
| * OpenShift API | ||
| * Allow nodes to be marked as confidential. This is specific per cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this would be updating the API for how we provision hosts in cloud environments? (MAPI/CAPI)
Or do you mean you want something to annotate a Node when it is confidential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We focus cloud provider first: Azure, GCP and AWS.
On-Prem environment makes sense for some use-cases, but the cluster has to be installed on top of (confidential) Virtual Machines, so there has to be something that manages VMs (e.g. OpenShift Virtualization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, updating the API is for provisioning confidential VMs (nodes) in cloud environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So apart from updating MAPI/CAPI to allow for creation of CVM, is there any other API change that constitutes "allow nodes to be marked as confidential"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoelSpeed we will need to teach the openshift installer how to create the boostrap nodes and how they can attest with an external trustee. And then how to deploy our operator on the boostrap node to attest upcoming control planes
| (cloud provider specific). | ||
| * Deploy the Confidential Cluster Operator on the bootstrap node | ||
|
|
||
| * Confidential Cluster Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please look at CVO capabilities. I believe this will want to be behind a new capability that is not included by default, but only included in clusters where the installer is flagged to include it
By adding it as a new opt-in capability it means that even though it's in the payload, it won't suddenly appear on existing clusters during upgrades
| the new configuration from a remotely attested resource endpoint. The MCS will | ||
| not serve Ignition configs directly for nodes anymore but will store those as | ||
| resources in a Trustee instance. To access those configurations, the node will | ||
| have to successfully remotely attest itself first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for clusters that are not running confidential compute? I assume their access remains the same?
Given this will create two classes of work for MCS, and doesn't apply to all nodes, have we considered alternatives that don't rely on external services being configured by admins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-confidential clusters remain the same.
Every node has to be attested, including the bootstrap node.
It can only be attested by an external service.
An exception is HostedControlPlane, where the control-plane nodes run on an already trusted cluster and need not attest.
| If any attestation step fails, the node keeps retrying indefinitely, in turn, | ||
| each Trustee server configured. This is required as a Trustee server may be | ||
| offline at any given point in time or because the reference values accepted by | ||
| Trustee have not yet been updated by the operator or the cluster | ||
| administrator. This infinite retry loop leaves the opportunity to the cluster | ||
| operator to investigate the failure and potentially manually update the | ||
| reference values accepted for the cluster. This is similar to how Ignition | ||
| retries infinitely until an error occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any in-cluster reporting for failed node join attempts? This is a point of frustration we have at the moment in that we have no visibility into ignition failures. It would be good to be able to see from within the cluster when a node join fails due to failed attestation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attestation failures are only reported by Trustee via log-messages.
The log can be viewed upon a attestation failure.
When the Trustee pod gets restarted, a new log is created (likely the old log is not saved).
One can run Trustee in debug mode for more messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trustee is running external to the cluster usually right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trustee is running in cluster. A nice feature we have in mind for trustee, would be to implement prometheus metrics for the attestations. In this way, we could collect and display much better the successful/failed attestation. But this feature is currently not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trustee can run either within the cluster or external to the cluster.
Having it within the cluster, makes the cluster manage its own nodes.
An external Trustee is a must for the first node booting (bootstrap during installation or first control-plane node after a graceful-shutdown)
| For each new machine registering to the service, the operator creates a CRD that | ||
| includes a uniquely generated UUID. This UUID is given back to the new node. The | ||
| operator watches for new Machine CRDs and sets up attestation and resource | ||
| policy in the Trustee instance, and generates random secret values to be used as | ||
| LUKS root keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that for clusters without machine api, we aren't intending to support an alternative flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work similarly for both MAPI and CAPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, clusters that have neither MAPI or CAPI sorry. How would a UPI cluster work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, our representation of the Machine is disjoint from MAPI/CAPI which it makes it working for UPI as well. Right now, UPI isn't in the support radar, but for sake of simplicity for the development, I think we will start from there. In this case, for UPI, there might be some manual work that the admin needs to perform, for example, to remove old machines. Since for CAPI, we are planning to couple the lifetime of our machine object with the CAPI machine, so when the CAPI machine is removed, our objects are also garbage collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to leverage owner references for that garbage collection? So these new Machine objects will exist in the same namespace as the CAPI machines?
| As part of the cluster installation process in cloud platforms, a bootstrap node | ||
| is created, which hosts a temporary control plane used to create the final | ||
| control plane and worker nodes of the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote attestation server is contacted first
Does ignition support multiple phases of pulls?
|
|
||
| * **Cloud Provider Dependency**: This feature relies on underlying cloud | ||
| provider CVM capabilities. The design aims for portability where possible but | ||
| will initially target specific cloud environments with mature CVM offerings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have links to the relevant cloud providers and their support for confidential VMs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure: https://learn.microsoft.com/en-us/azure/confidential-computing/quick-create-confidential-vm-azure-cli
GCP: https://docs.cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snp-work-launch.html
|
/cc |
- Fix title - Add reviewers - Fix some headers - Add missing sections (e.g. DevPreview -> TechnPreview ) Still fails due to missing approvers
766b9a1 to
a338a2b
Compare
|
@uril: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| the first boot of each node. Once completed, both confidentiality and | ||
| integrity will be guaranteed. | ||
|
|
||
| We are working on a more detailed threat model, which will be submitted in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any estimated timeline for this?
| * In the first phase, we will consider the bootstrap node and the first boot of | ||
| each new node to be trusted. In this phase, only the confidentiality of the | ||
| cluster will be guaranteed. We will assume the attacker can read data but not | ||
| write data (to the disk, cloud metadata config, etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, can you clarify how we're guaranteeing confidentiality when an attacker can read data? Is this specific to the bootstrap node which is not fully trusted yet, but the cluster itself we are guaranteeing cannot be read from.
| have been pre-computed and stored, or pull the container image itself and | ||
| directly compute the values. | ||
|
|
||
| The PCR pre-calculation flow is demonstrated in this presentation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a definition of the PCR acronym in here somewhere.
UKI SEV-SNP could also use some clarification for those of us less familiar with the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add those to the document.
PCRs (Platform Configuration Registers) of a TPM (Trusted Platform Module) store cryptographic measurement (hash values) according to the state of different part of the stack (boot-loader, kernel, initrd, command-line, signatures, etc)
UKI (Unified Kernel Image) is packaging of boot-stub + vmlinuz + initrd + commandline (and more) in a single signed binary file (of PE format)
| In a HCP scenario, the operator will only be responsible for the worker | ||
| nodes. As the Confidential Cluster Operator will be hosted in the control plane, | ||
| the nodes hosting those services are considered part of the Trusted Computing | ||
| Base (TCB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you spell out a theoretical real world scenario where this feature would make sense with a hypershift cluster. If my control plane is hosted by a provider, I wouldn't think there's much utility to having just the workers confidential. Is the assumption that the hypershift control plane cluster itself is a confidential cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the assumption is that the cluster that is hosting the hypershift control plane is running in a trusted environment.
I'll add this assumption to the document
| ##### Making Confidential Cluster Operator fit OpenShift | ||
| The following steps are to be taken to make cocl-operator fit OpenShift | ||
| * APIs and CRDs are to be written in Go | ||
| * When the operator is built, the APIs/CRDs are translated to Rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this straightforward and maintainable or a lot of hacking involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is pretty straightforward using Go -> yaml, then yaml - kopium -> Rust
|
|
||
| While we don’t want to support that for production, it should also be possible | ||
| to test adding confidential nodes to a non confidential cluster where the | ||
| Confidential Cluster Operator would be running, making testing easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get details added here. How many jobs are we talking about, what special requirements will they have, what tests suites will they run, and what new testing will be added. (in general terms, not specific tests, I'm just curious about what you will test and the complexity involved)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoodwin right now, we are using a kind cluster + kubevirt installed in kind. Then, we install our operator and we attest the VM created with kubevirt as part of the integration tests. Right now we have tests like, testing a single attestation (with a single VM), reboots the same VM multiple times and do multiple attestation for subsequent boots, or running 2 attestations/VM in parallel, or checking that an attestation fails when the corresponding machine object is deleted.
We plan to add similar tests using Azure VM for gating tests before merging a PR in the operator.
| ### Tech Preview -> GA | ||
|
|
||
| - End to end tests | ||
| - Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have plans for security audits of the implementation before we go GA, and by whom.
| Trustee PIN configuration. | ||
| 1. Cluster shutdown | ||
| 1. Before restarting any node, the Trustee instance must be made available at | ||
| the domain or IP configured above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if it's down or goes down mid-way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if trustee is down, then the attestation cannot be passed and the nodes fail to boot since they cannot retrieve the LUKS key of their root disk
|
|
||
| The cluster administrator flow should not change when updating a cluster. The | ||
| Confidential Cluster Operator will perform the necessary configuration to allow | ||
| nodes to attest to the cluster using new version of RHCOS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any details on update concerns for the confidential cluster operator itself that could be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator can be updated without major disruptions, since the Machine representation, reference values, the policies and the secrets don't change.
| The following steps are to be taken to make cocl-operator fit OpenShift | ||
| * APIs and CRDs are to be written in Go | ||
| * When the operator is built, the APIs/CRDs are translated to Rust | ||
| * The system openssl library is to be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add some of the comments made yesterday about konflux building and CVE scanning to help solidify the case for rust.
| * Build and upload disk images using UKI and systemd-boot to cloud providers. | ||
| * Add attestation client to the operating system, such that nodes can request | ||
| attestation and fetch secrets upon a successful attestation. | ||
| * Add a clevis trustee pin to fetch LUKS passphrase upon a successful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For parts of this workflow that rely on asymmetric encryption to pass these secrets around, do we have an idea of how we'd configure this to use PQC-safe algorithms?
I imagine part of that will be hardware dependency for the keys that are generated by Intel and AMD. Do we have an outlook or timeline for if/when those will be hybrid (classical + PQC)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The secrets are passed by the attestation server (trustee). Hence, it needs to implement the encryption of the secrets with a PQC-safe algorithms. But it is also true that the hardware sign the quote, and probably also there they need to follow the PQC-safe standards.
| * **ConfidentialCluster CRD**: This custom resource is used to configure the | ||
| Confidential Cluster Operator and indirectly the Trustee instance that is used | ||
| to attest nodes in the cluster and provide secrets. | ||
| It is namespaced, versioned and contains: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a lot of cluster scoped singleton resources for standlone clusters, this feels very much like it would be the same so I'd imagine rather than a future where you have multiple copies of the operator in multiple namespaces, it would be better to design the API first hand to consider a future with multiple trustees
What would the advantage be of deploying the operator multiple times?
|
|
||
| * **Ignition spec changes**: The Ignition configuration specification will be | ||
| extended to support: | ||
| * configuring the Clevis trustee pin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to start linking out or somehow otherwise indicating which of the items in this section have already been achieved
| * **Ignition spec changes**: The Ignition configuration specification will be | ||
| extended to support: | ||
| * configuring the Clevis trustee pin | ||
| * enable fetching remote config after remote attestation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That answered my question yes, attestation is more like a checksum of the OS that booted vs a unique credential to identity a specific node
|
|
||
| ##### Making Confidential Cluster Operator fit OpenShift | ||
| The following steps are to be taken to make cocl-operator fit OpenShift | ||
| * APIs and CRDs are to be written in Go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is becoming part of the payload, these should live in o/api, have you raised a PR to add them there yet?
| To make Confidential Cluster Operator a first citizen in the Openshift | ||
| echosystem, interfaces are written in Go and generated with OpenShift tools. | ||
|
|
||
| When the operator is built, the interfaces are converted to Rust. COPY FROM Jakob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY FROM Jakob?
| For each new machine registering to the service, the operator creates a CRD that | ||
| includes a uniquely generated UUID. This UUID is given back to the new node. The | ||
| operator watches for new Machine CRDs and sets up attestation and resource | ||
| policy in the Trustee instance, and generates random secret values to be used as | ||
| LUKS root keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to leverage owner references for that garbage collection? So these new Machine objects will exist in the same namespace as the CAPI machines?
This enhancement proposes the integration of confidential computing
capabilities into OpenShift cluster, enabling the deployment of
Confidential Clusters