Enhancement proposal for Confidential Clusters #1878

uril · 2025-10-27T18:08:36Z

This enhancement proposes the integration of confidential computing
capabilities into OpenShift cluster, enabling the deployment of
Confidential Clusters

openshift-ci · 2025-10-27T18:09:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dhellmann for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-10-27T18:09:42Z

Hi @uril. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

travier · 2025-11-06T11:38:10Z

/ok-to-test

uril · 2025-11-06T15:24:06Z

/retest

cgwalters

Awesome work overall! There's of course huge amounts of detail in some of this, but I think the outline looks good.

enhancements/security/confidential-clusters.md

cgwalters · 2025-11-06T19:21:33Z

enhancements/security/confidential-clusters.md

+    instance
+
+* RHEL CoreOS
+  * Support verifying the integrity of the disk content during re-provisioning


We need to boot in a pure stateless mode here, where we're not accessing any persistent storage for /etc and /var right?

To address the larger concern that we can not trust the filesystem itself on the disk on first boot, we need some form of integrity verification that covers the entire partition. I could be implemented in a similar fashion to what is done with Secure Execution on S390x.

If we don't take this concern into account, then we could indeed only read the fs-verity verified content from the composefs repo and re-generate the /etc & /var content from it.

cgwalters · 2025-11-06T19:23:06Z

enhancements/security/confidential-clusters.md

+  * Measure Ignition config in a PCR value, before parsing it
+
+* Machine Config Operator
+  * Ensure that MachineConfigs are only served to attested nodes


I still really hope that we get away from having a MCS at all by shrinking the role of Ignition such that everything needed to join fits into the bootstrap config which really really should be able to fit in e.g. AWS instance user-data store completely and the like.

I still really hope that we get away from having a MCS at all by shrinking the role of Ignition

What are the big gripes with MCS that make you say this?

instance user-data store completely

On AWS this is 16KiB right? I have no scope on how big we currently sit at, perhaps we should do some analysis on this and where the biggest wins would be in scoping this down as part of this EP process?

enhancements/security/confidential-clusters.md

cgwalters · 2025-11-06T19:49:06Z

enhancements/security/confidential-clusters.md

+node, which is considered trusted and it is used to bootstrap the trust for the
+rest of the cluster.
+
+In phase 2, the bootstrap node itself must be attested to establish trust. It is


Yeah but won't most people who want to do this actually want HCP anyways? I would definitely put HCP support far in front of this as a priority

That could be an option. HCP deployments place the trust in the cluster hosting the control plane, so for it to make sense for Confidential Clusters, it would be a configuration with a Hosted Control Plane in a trusted environment (likely a Bare Metal cluster) and HCP Workers in a cloud.

If we want everything in the cloud then we are back to the standalone cluster case for the control plane part as you can not claim that your workers are confidential if the control plane is hosted on the same cloud with non confidential VMs.

cgwalters · 2025-11-06T19:50:26Z

enhancements/security/confidential-clusters.md

+have been pre-computed and stored, or pull the container image itself and
+directly compute the values.


I'd lean towards pull, but have a cache of course of container-sha ➡️ PCRs

Yeah, pull is the preferred option, if available (image is labeled with 'org.coreos.pcrs')

travier · 2025-11-07T14:52:10Z

Direct link to rendered docs: https://github.com/uril/openshift_enhancements/blob/confidential-clusters-proposal/enhancements/security/confidential-clusters.md

enhancements/security/confidential-clusters.md

yuqi-zhang

Some general comments inline

yuqi-zhang · 2025-11-14T21:11:26Z

enhancements/security/confidential-clusters.md

+
+## Proposal
+
+Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to


Basic question: the CVM is the entire node right? You can't say, run 2 CVMs on one machine, or run things outside of the CVM on that machine?

Right, in our case, OpenShift node == CVM == OpenShift machine.

The "host machine" (the cloud server) can run many CVMs (and other things).

Yes, the entire node runs as a Confidential VM provided by the cloud provider. You don't control on which host your VM runs (it's a cloud), and you can not run things outside.

On which platforms are CVMs a thing? Does this feature make sense in on-prem environments?

For now we focus public clouds. Azure, GCP and AWS offer CVMs.

For on-prem, there is a need for something like OpenShift Virtualization to create/manage the CVMs.

Use-case for on-prem can be:

Have different teams/departments in a company run workload on their own confidential cluster.

Prevent hosts admins from looking at workloads with confidential data

yuqi-zhang · 2025-11-14T21:12:32Z

enhancements/security/confidential-clusters.md

+components:
+
+* OpenShift API
+  * Allow nodes to be marked as confidential. This is specific per cloud


As a clarification here, this will be a cluster level setting? Or are you proposing that in one cluster, some nodes can be CVMs and others not?

All the nodes of a confidential cluster are CVMs.

The specific configuration/API, for requesting cloud providers to create a CVM is platform dependent and is not kept at the cluster level.

A mixed cluster of non/confidential nodes is technically possible, but is not safe.

It will be cluster wide. A cluster will be either all confidential nodes or not at all. Technically you can mix things, but it does not make sense for a cluster running in a cloud to be mixed.

yuqi-zhang · 2025-11-14T21:13:35Z

enhancements/security/confidential-clusters.md

+    reference-values (expected "correct" values) in Trustee.
+
+* RHEL CoreOS
+  * Add support for composefs (native), UKI, and systemd-boot to bootc (Bootable


Since the MCO and on-cluster RHCOS operations doesn't currently use bootc at all, would that integration be needed here?

We will indeed need "direct" bootc support in the MCO (i.e. not use rpm-ostree at all anymore).

yuqi-zhang · 2025-11-14T21:14:00Z

enhancements/security/confidential-clusters.md

+    (cloud provider specific).
+  * Deploy the Confidential Cluster Operator on the bootstrap node
+
+* Confidential Cluster Operator


Will this be running as a core payload operator that's always present, or only deployed conditionally?

It will be an operator part of the core payload but only running if needed.

Can we get some discussion of the Rust language situation discussed on the arch call a few weeks back, added to the enhancement, especially given the plan for it to be in the core payload.

Also a section outlining the inclusion in core payload and why would be great, important decisions to be made here.

Also, please look at CVO capabilities. I believe this will want to be behind a new capability that is not included by default, but only included in clusters where the installer is flagged to include it

By adding it as a new opt-in capability it means that even though it's in the payload, it won't suddenly appear on existing clusters during upgrades

I'll add something about Rust, core payload and CVO capability to this enhancement proposal.

yuqi-zhang · 2025-11-14T21:17:08Z

enhancements/security/confidential-clusters.md

+installer, passing in the URL of the external Trustee instance chosen above.
+1. The OpenShift installer generates a set of configuration files for the
+   external Trustee instance.
+1. If the cluster creator adds/removes/modifies MachineConfigs, the


Could you clarify on this point? The admin shouldn't be able to "modify configs on the fly" during installation. The MCO has a singular render generation phase, if that's what you're trying to fetch here.

The idea is that the config that will be used for the external Trustee server will include the full config passed to the bootstrap node. If anything modifies this config, then the Trustee config will have to be re-generated. We don't expect the manifests to be modified live during the installation.

One of the later steps of the installer process is to generate the ignition, which spits out two ignitions, one is the stub for the control plane, the other is the full bootstrap ignition. There should be no changes to the bootstrap ignition after this point generally.

I assume any information that the trustee needs the node to know is added to the ignition before this step right? Once the trustee gets the ignition it doesn't generate anything that needs to modify what the node needs to know?

This configuration files are for the remote Trustee (used to attest the bootstrap node), not for the bootstrap node itself. For example if Trustee is to verify PCR7, e.g. the secure-boot key database, Trustee needs to have a policy that checks PCR7 and a "good" value to compare the runtime measurement against.
(I'm not sure that answers your question)

yuqi-zhang · 2025-11-14T21:18:47Z

enhancements/security/confidential-clusters.md

+This enhancement introduces some new API extensions:
+
+* **Running nodes on cloud CVMs**:
+For each supported cloud provider, confidential computing types and code need to


Could you provide some examples for this? Just curious what that would look like in practice.

Also curious if this affects the ongoing MAPI->CAPI transition at all

Not all clouds will be supported? What clouds are we targeting?

We first target Azure, GCP and AWS, with AMD SEV-SNP and Intel TDX.

Azure: https://learn.microsoft.com/en-us/azure/confidential-computing/quick-create-confidential-vm-azure-cli
GCP: https://docs.cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snp-work-launch.html

And also, for example in the installer:
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_azureprovider.go
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_gcpprovider.go
https://github.com/openshift/installer/blob/main/vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.go

With regards to MAPI/CAPI - I think both should be able to request the cloud to run a CVM

yuqi-zhang · 2025-11-14T21:24:34Z

enhancements/security/confidential-clusters.md

+In phase 2, the initial configuration will be modified to tell Ignition to fetch
+the new configuration from a remotely attested resource endpoint. The MCS will
+not serve Ignition configs directly for nodes anymore but will store those as
+resources in a Trustee instance. To access those configurations, the node will


Is the trustee instance responsible for asking the MCS for the contents through some in-cluster proxy, or would we have to have the MCS initiate that?

Please refer to the 2-phase attestation proposal: trusted-execution-clusters/operator#68

The Trustee instance is being managed by the confidential-clusters-operator.
The operator is the one that would be asking MCS ignition configuration for new nodes and upload them as a secret to Trustee.
Alternatives are 1. The node gets encrypted ignition configs and request the key to decrypt them from Trustee. 2. The node requests a client-certificate and MCS only serves clients with a valid certificate.
I think in both cases, cocl-operator needs to talk with MCS and update Trustee, such that the information is available for the node only after a successful attestation.

enhancements/security/confidential-clusters.md

patrickdillon · 2025-11-17T15:03:28Z

enhancements/security/confidential-clusters.md

+As part of the cluster installation process in cloud platforms, a bootstrap node
+is created, which hosts a temporary control plane used to create the final
+control plane and worker nodes of the cluster.


There is another part of the installation process we should consider, where the installer:

generates the ignition file for the bootstrap node

uploads bootstrap ignition to a cloud storage bucket

puts pointer ignition in bootstrap node userdata redirecting bootstrap node to pull ignition from storage bucket (using a self-signed URL, although for Azure I think self-signed URL support is WIP and azure uses storage account keys)

It's unclear whether this model will continue to work with the remote attestation service. If the First Boot configuration from the attestation service can be merged alongside the bootstrap ignition bucket, then it would require fewer changes to the installer. For example (pseudo), the bootstrap pointer ignition would be injected with the additional attestation server source:

{ "ignition": { "config": { "merge": [ { "source": "http://<registration-service>/ignition" }, { "source": "http://<cloud-bucket>/ignition" } ] } } }

This would utilize ignition's merge functionality to grab the configs from both the attestation server and the cloud bucket.

But if it's a requirement that the remote attestation server is contacted first (or only) then the installer would presumably need to be updated to upload bootstrap ignition so it could be served by the attestation service.

remote attestation server is contacted first

Does ignition support multiple phases of pulls?

What we have tested, yes it supported at least 2. Ours and the MCO one

travier · 2025-12-01T15:35:42Z

We (Alice & Jakob) wrote a blog post summarizing part of this enhancement: https://developers.redhat.com/articles/2025/11/26/trusted-execution-clusters-operator-design-and-flow-overview#

JoelSpeed · 2025-12-08T13:23:01Z

enhancements/security/confidential-clusters.md

+This enhancement proposes the integration of **confidential computing**
+capabilities into **OpenShift cluster**, enabling the deployment of
+**Confidential Clusters**. A confidential cluster is an OpenShift cluster where
+all nodes run on Confidential Virtual Machines (CVMs) and are remotely attested


Apart from attestation, what else makes a VM a CVM? Might be worth adding a glossary to explain what a CVM is

A CVM is a virtual-machine that run on confidential-computing enabled hardware, such as AMD SEV-SNP and Intel TDX.
I'll add an introduction to Confidential Computing.

JoelSpeed · 2025-12-08T13:25:32Z

enhancements/security/confidential-clusters.md

+  other unauthorized entities.
+
+* Implement a robust remote attestation process for CVM nodes to verify their
+  trustworthiness before sharing secrets and joining the cluster. The remote


Our ignition process today shares secrets for joining the cluster without any authentication. This implies to me that we want to move secret sharing to after attestation has happened? Have you considered the existing secrets in the ignition flow and how they might be moved to be fetched later?

Would having an authenticated ignition server be an option to mitigate this problem?

@JoelSpeed yes, we are thinking of releasing ignition config after a successful attestation. This will be covered later since it is a complex design, but you can see the draft in the 2-phase attestation proposal

The suggested solution for protecting secrets within ignition is having the ignition configuration fetched directly from Trustee upon a successful configuration.
Alternatively, client-side certificates can be used, or perhaps encrypted ignition configurations.

JoelSpeed · 2025-12-08T13:27:37Z

enhancements/security/confidential-clusters.md

+
+## Proposal
+
+Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to


On which platforms are CVMs a thing? Does this feature make sense in on-prem environments?

JoelSpeed · 2025-12-08T13:29:24Z

enhancements/security/confidential-clusters.md

+components:
+
+* OpenShift API
+  * Allow nodes to be marked as confidential. This is specific per cloud


So this would be updating the API for how we provision hosts in cloud environments? (MAPI/CAPI)

Or do you mean you want something to annotate a Node when it is confidential?

We focus cloud provider first: Azure, GCP and AWS.
On-Prem environment makes sense for some use-cases, but the cluster has to be installed on top of (confidential) Virtual Machines, so there has to be something that manages VMs (e.g. OpenShift Virtualization).

Yeah, updating the API is for provisioning confidential VMs (nodes) in cloud environments.

So apart from updating MAPI/CAPI to allow for creation of CVM, is there any other API change that constitutes "allow nodes to be marked as confidential"?

@JoelSpeed we will need to teach the openshift installer how to create the boostrap nodes and how they can attest with an external trustee. And then how to deploy our operator on the boostrap node to attest upcoming control planes

JoelSpeed · 2025-12-08T13:31:08Z

enhancements/security/confidential-clusters.md

+    (cloud provider specific).
+  * Deploy the Confidential Cluster Operator on the bootstrap node
+
+* Confidential Cluster Operator


Also, please look at CVO capabilities. I believe this will want to be behind a new capability that is not included by default, but only included in clusters where the installer is flagged to include it

By adding it as a new opt-in capability it means that even though it's in the payload, it won't suddenly appear on existing clusters during upgrades

JoelSpeed · 2025-12-08T15:07:58Z

enhancements/security/confidential-clusters.md

+the new configuration from a remotely attested resource endpoint. The MCS will
+not serve Ignition configs directly for nodes anymore but will store those as
+resources in a Trustee instance. To access those configurations, the node will
+have to successfully remotely attest itself first.


What happens for clusters that are not running confidential compute? I assume their access remains the same?

Given this will create two classes of work for MCS, and doesn't apply to all nodes, have we considered alternatives that don't rely on external services being configured by admins?

Non-confidential clusters remain the same.

Every node has to be attested, including the bootstrap node.
It can only be attested by an external service.

An exception is HostedControlPlane, where the control-plane nodes run on an already trusted cluster and need not attest.

JoelSpeed · 2025-12-08T15:12:35Z

enhancements/security/confidential-clusters.md

+If any attestation step fails, the node keeps retrying indefinitely, in turn,
+each Trustee server configured. This is required as a Trustee server may be
+offline at any given point in time or because the reference values accepted by
+Trustee have not yet been updated by the operator or the cluster
+administrator. This infinite retry loop leaves the opportunity to the cluster
+operator to investigate the failure and potentially manually update the
+reference values accepted for the cluster. This is similar to how Ignition
+retries infinitely until an error occurs.


Is there any in-cluster reporting for failed node join attempts? This is a point of frustration we have at the moment in that we have no visibility into ignition failures. It would be good to be able to see from within the cluster when a node join fails due to failed attestation

Attestation failures are only reported by Trustee via log-messages.
The log can be viewed upon a attestation failure.
When the Trustee pod gets restarted, a new log is created (likely the old log is not saved).
One can run Trustee in debug mode for more messages.

Trustee is running external to the cluster usually right?

Trustee is running in cluster. A nice feature we have in mind for trustee, would be to implement prometheus metrics for the attestations. In this way, we could collect and display much better the successful/failed attestation. But this feature is currently not present.

Trustee can run either within the cluster or external to the cluster.
Having it within the cluster, makes the cluster manage its own nodes.

An external Trustee is a must for the first node booting (bootstrap during installation or first control-plane node after a graceful-shutdown)

JoelSpeed · 2025-12-08T15:21:24Z

enhancements/security/confidential-clusters.md

+For each new machine registering to the service, the operator creates a CRD that
+includes a uniquely generated UUID. This UUID is given back to the new node. The
+operator watches for new Machine CRDs and sets up attestation and resource
+policy in the Trustee instance, and generates random secret values to be used as
+LUKS root keys.


I assume that for clusters without machine api, we aren't intending to support an alternative flow?

It should work similarly for both MAPI and CAPI

I meant, clusters that have neither MAPI or CAPI sorry. How would a UPI cluster work?

Right now, our representation of the Machine is disjoint from MAPI/CAPI which it makes it working for UPI as well. Right now, UPI isn't in the support radar, but for sake of simplicity for the development, I think we will start from there. In this case, for UPI, there might be some manual work that the admin needs to perform, for example, to remove old machines. Since for CAPI, we are planning to couple the lifetime of our machine object with the CAPI machine, so when the CAPI machine is removed, our objects are also garbage collected.

Do you intend to leverage owner references for that garbage collection? So these new Machine objects will exist in the same namespace as the CAPI machines?

JoelSpeed · 2025-12-08T15:22:52Z

enhancements/security/confidential-clusters.md

+As part of the cluster installation process in cloud platforms, a bootstrap node
+is created, which hosts a temporary control plane used to create the final
+control plane and worker nodes of the cluster.


remote attestation server is contacted first

Does ignition support multiple phases of pulls?

JoelSpeed · 2025-12-08T15:25:40Z

enhancements/security/confidential-clusters.md

+
+* **Cloud Provider Dependency**: This feature relies on underlying cloud
+  provider CVM capabilities. The design aims for portability where possible but
+  will initially target specific cloud environments with mature CVM offerings.


Do you have links to the relevant cloud providers and their support for confidential VMs?

Azure: https://learn.microsoft.com/en-us/azure/confidential-computing/quick-create-confidential-vm-azure-cli
GCP: https://docs.cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snp-work-launch.html

alicefr · 2026-01-08T08:38:59Z

/cc

- Fix title - Add reviewers - Fix some headers - Add missing sections (e.g. DevPreview -> TechnPreview ) Still fails due to missing approvers

openshift-ci · 2026-01-14T14:44:47Z

@uril: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`1cf3b92`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dgoodwin · 2026-01-14T14:28:29Z

enhancements/security/confidential-clusters.md

+  the first boot of each node. Once completed, both confidentiality and
+  integrity will be guaranteed.
+
+We are working on a more detailed threat model, which will be submitted in a


Any estimated timeline for this?

dgoodwin · 2026-01-14T14:32:34Z

enhancements/security/confidential-clusters.md

+* In the first phase, we will consider the bootstrap node and the first boot of
+  each new node to be trusted. In this phase, only the confidentiality of the
+  cluster will be guaranteed. We will assume the attacker can read data but not
+  write data (to the disk, cloud metadata config, etc.).


I may be missing something, can you clarify how we're guaranteeing confidentiality when an attacker can read data? Is this specific to the bootstrap node which is not fully trusted yet, but the cluster itself we are guaranteeing cannot be read from.

dgoodwin · 2026-01-14T14:39:48Z

enhancements/security/confidential-clusters.md

+have been pre-computed and stored, or pull the container image itself and
+directly compute the values.
+
+The PCR pre-calculation flow is demonstrated in this presentation:


Can we get a definition of the PCR acronym in here somewhere.

UKI SEV-SNP could also use some clarification for those of us less familiar with the terminology.

Thanks, I'll add those to the document.

PCRs (Platform Configuration Registers) of a TPM (Trusted Platform Module) store cryptographic measurement (hash values) according to the state of different part of the stack (boot-loader, kernel, initrd, command-line, signatures, etc)

UKI (Unified Kernel Image) is packaging of boot-stub + vmlinuz + initrd + commandline (and more) in a single signed binary file (of PE format)

dgoodwin · 2026-01-14T14:44:37Z

enhancements/security/confidential-clusters.md

+In a HCP scenario, the operator will only be responsible for the worker
+nodes. As the Confidential Cluster Operator will be hosted in the control plane,
+the nodes hosting those services are considered part of the Trusted Computing
+Base (TCB).


Could you spell out a theoretical real world scenario where this feature would make sense with a hypershift cluster. If my control plane is hosted by a provider, I wouldn't think there's much utility to having just the workers confidential. Is the assumption that the hypershift control plane cluster itself is a confidential cluster?

Yes, the assumption is that the cluster that is hosting the hypershift control plane is running in a trusted environment.

I'll add this assumption to the document

dgoodwin · 2026-01-14T14:45:29Z

enhancements/security/confidential-clusters.md

+##### Making Confidential Cluster Operator fit OpenShift
+The following steps are to be taken to make cocl-operator fit OpenShift
+  * APIs and CRDs are to be written in Go
+  * When the operator is built, the APIs/CRDs are translated to Rust


Was this straightforward and maintainable or a lot of hacking involved?

No, it is pretty straightforward using Go -> yaml, then yaml - kopium -> Rust

dgoodwin · 2026-01-14T14:51:39Z

enhancements/security/confidential-clusters.md

+
+While we don’t want to support that for production, it should also be possible
+to test adding confidential nodes to a non confidential cluster where the
+Confidential Cluster Operator would be running, making testing easier.


Could we get details added here. How many jobs are we talking about, what special requirements will they have, what tests suites will they run, and what new testing will be added. (in general terms, not specific tests, I'm just curious about what you will test and the complexity involved)

@dgoodwin right now, we are using a kind cluster + kubevirt installed in kind. Then, we install our operator and we attest the VM created with kubevirt as part of the integration tests. Right now we have tests like, testing a single attestation (with a single VM), reboots the same VM multiple times and do multiple attestation for subsequent boots, or running 2 attestations/VM in parallel, or checking that an attestation fails when the corresponding machine object is deleted.

We plan to add similar tests using Azure VM for gating tests before merging a PR in the operator.

dgoodwin · 2026-01-14T14:55:55Z

enhancements/security/confidential-clusters.md

+### Tech Preview -> GA
+
+- End to end tests
+- Documentation


Do we have plans for security audits of the implementation before we go GA, and by whom.

dgoodwin · 2026-01-14T14:57:29Z

enhancements/security/confidential-clusters.md

+   Trustee PIN configuration.
+1. Cluster shutdown
+1. Before restarting any node, the Trustee instance must be made available at
+   the domain or IP configured above.


What happens if it's down or goes down mid-way?

if trustee is down, then the attestation cannot be passed and the nodes fail to boot since they cannot retrieve the LUKS key of their root disk

dgoodwin · 2026-01-14T14:58:07Z

enhancements/security/confidential-clusters.md

+
+The cluster administrator flow should not change when updating a cluster. The
+Confidential Cluster Operator will perform the necessary configuration to allow
+nodes to attest to the cluster using new version of RHCOS.


Any details on update concerns for the confidential cluster operator itself that could be added here?

The operator can be updated without major disruptions, since the Machine representation, reference values, the policies and the secrets don't change.

dgoodwin · 2026-01-14T14:59:58Z

enhancements/security/confidential-clusters.md

+The following steps are to be taken to make cocl-operator fit OpenShift
+  * APIs and CRDs are to be written in Go
+  * When the operator is built, the APIs/CRDs are translated to Rust
+  * The system openssl library is to be used


could you also add some of the comments made yesterday about konflux building and CVE scanning to help solidify the case for rust.

rhmdnd · 2026-01-19T17:28:15Z

enhancements/security/confidential-clusters.md

+  * Build and upload disk images using UKI and systemd-boot to cloud providers.
+  * Add attestation client to the operating system, such that nodes can request
+    attestation and fetch secrets upon a successful attestation.
+  * Add a clevis trustee pin to fetch LUKS passphrase upon a successful


For parts of this workflow that rely on asymmetric encryption to pass these secrets around, do we have an idea of how we'd configure this to use PQC-safe algorithms?

I imagine part of that will be hardware dependency for the keys that are generated by Intel and AMD. Do we have an outlook or timeline for if/when those will be hybrid (classical + PQC)?

The secrets are passed by the attestation server (trustee). Hence, it needs to implement the encryption of the secrets with a PQC-safe algorithms. But it is also true that the hardware sign the quote, and probably also there they need to follow the PQC-safe standards.

JoelSpeed · 2026-01-22T17:46:00Z

enhancements/security/confidential-clusters.md

+* **ConfidentialCluster CRD**: This custom resource is used to configure the
+  Confidential Cluster Operator and indirectly the Trustee instance that is used
+  to attest nodes in the cluster and provide secrets.
+  It is namespaced, versioned and contains:


We have a lot of cluster scoped singleton resources for standlone clusters, this feels very much like it would be the same so I'd imagine rather than a future where you have multiple copies of the operator in multiple namespaces, it would be better to design the API first hand to consider a future with multiple trustees

What would the advantage be of deploying the operator multiple times?

JoelSpeed · 2026-01-22T17:47:08Z

enhancements/security/confidential-clusters.md

+
+* **Ignition spec changes**: The Ignition configuration specification will be
+  extended to support:
+  * configuring the Clevis trustee pin


It might be good to start linking out or somehow otherwise indicating which of the items in this section have already been achieved

JoelSpeed · 2026-01-22T17:48:01Z

enhancements/security/confidential-clusters.md

+* **Ignition spec changes**: The Ignition configuration specification will be
+  extended to support:
+  * configuring the Clevis trustee pin
+  * enable fetching remote config after remote attestation


That answered my question yes, attestation is more like a checksum of the OS that booted vs a unique credential to identity a specific node

JoelSpeed · 2026-01-22T17:51:17Z

enhancements/security/confidential-clusters.md

+
+##### Making Confidential Cluster Operator fit OpenShift
+The following steps are to be taken to make cocl-operator fit OpenShift
+  * APIs and CRDs are to be written in Go


Given this is becoming part of the payload, these should live in o/api, have you raised a PR to add them there yet?

JoelSpeed · 2026-01-22T17:51:42Z

enhancements/security/confidential-clusters.md

+To make Confidential Cluster Operator a first citizen in the Openshift
+echosystem, interfaces are written in Go and generated with OpenShift tools.
+
+When the operator is built, the interfaces are converted to Rust. COPY FROM Jakob


COPY FROM Jakob?

JoelSpeed · 2026-01-22T17:54:37Z

enhancements/security/confidential-clusters.md

+For each new machine registering to the service, the operator creates a CRD that
+includes a uniquely generated UUID. This UUID is given back to the new node. The
+operator watches for new Machine CRDs and sets up attestation and resource
+policy in the Trustee instance, and generates random secret values to be used as
+LUKS root keys.


Do you intend to leverage owner references for that garbage collection? So these new Machine objects will exist in the same namespace as the CAPI machines?

openshift-ci bot requested review from dougbtv and sdodson October 27, 2025 18:09

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 27, 2025

openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 6, 2025

uril force-pushed the confidential-clusters-proposal branch from c213033 to 677a330 Compare November 6, 2025 15:21

cgwalters reviewed Nov 6, 2025

View reviewed changes

travier reviewed Nov 7, 2025

View reviewed changes

enhancements/security/confidential-clusters.md Outdated Show resolved Hide resolved

travier reviewed Nov 7, 2025

View reviewed changes

enhancements/security/confidential-clusters.md Outdated Show resolved Hide resolved

uril force-pushed the confidential-clusters-proposal branch from 677a330 to ff97116 Compare November 9, 2025 20:02

Jakob-Naucke reviewed Nov 12, 2025

View reviewed changes

enhancements/security/confidential-clusters.md Outdated Show resolved Hide resolved

yuqi-zhang reviewed Nov 14, 2025

View reviewed changes

uril force-pushed the confidential-clusters-proposal branch from ff97116 to 766b9a1 Compare November 17, 2025 08:48

patrickdillon reviewed Nov 17, 2025

View reviewed changes

JoelSpeed reviewed Dec 8, 2025

View reviewed changes

openshift-ci bot requested a review from alicefr January 8, 2026 08:39

uril added 7 commits January 13, 2026 14:28

Enhancement proposal for Confidential Clusters

afee0f5

FIXUP: add introduction to confidential-computing and CVM

e3c45c4

FIXUP: add Programming Languages - Rust and Go

f8ee928

FIXUP: add CVO capability

a322fe1

FIXUP: the cocl-operator is a part of Core Payload

70cbbcf

FIXUP: fix some markdownlint complaints

c1167f0

- Fix title - Add reviewers - Fix some headers - Add missing sections (e.g. DevPreview -> TechnPreview ) Still fails due to missing approvers

Fixup: Add "Machine" and "ApprovedImage" CRDs

a338a2b

uril force-pushed the confidential-clusters-proposal branch from 766b9a1 to a338a2b Compare January 13, 2026 12:30

FIXUP: add a link to the bootstrap-phase design

1cf3b92

dgoodwin reviewed Jan 14, 2026

View reviewed changes

rhmdnd reviewed Jan 19, 2026

View reviewed changes

JoelSpeed reviewed Jan 22, 2026

View reviewed changes

		have been pre-computed and stored, or pull the container image itself and
		directly compute the values.


		## Proposal

		Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to

Enhancement proposal for Confidential Clusters #1878

Are you sure you want to change the base?

Enhancement proposal for Confidential Clusters #1878

Uh oh!

Conversation

uril commented Oct 27, 2025

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

travier commented Nov 6, 2025

Uh oh!

uril commented Nov 6, 2025

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

travier commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

travier Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uril Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

travier Nov 17, 2025 •

edited

Loading

uril Dec 16, 2025 •

edited

Loading

uril Jan 20, 2026 •

edited

Loading

patrickdillon Nov 17, 2025 •

edited

Loading