Skip to content

Conversation

@lakshmimsft
Copy link
Contributor

This PR includes design document to help redact sensitive data for Radius.Security/secrets

Signed-off-by: lakshmimsft <[email protected]>
Signed-off-by: lakshmimsft <[email protected]>
@lakshmimsft lakshmimsft force-pushed the lakshmimsft/radcoresecretsupd branch 5 times, most recently from 81eeba6 to efe6131 Compare November 12, 2025 19:05
Signed-off-by: lakshmimsft <[email protected]>
@lakshmimsft lakshmimsft force-pushed the lakshmimsft/radcoresecretsupd branch from efe6131 to dcdb5dc Compare November 12, 2025 19:07
#### User story 1: Developer deploying secrets to Azure Key Vault

Alice, a platform engineer, wants to deploy application secrets to Azure Key Vault using Radius. She defines a `Radius.Security/secrets` resource with sensitive credentials and associates it with an Azure Key Vault recipe. The recipe deploys the secrets to Azure Key Vault, and after successful deployment, the sensitive data is removed from Radius database. Alice can still reference the secrets via output resources, but sensitive values are no longer stored in Radius.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how we handle credentials today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, credentials entered with radius are imperatively stored as k8s secret and metadata is stored in UCP. This is a usecase for a customer using the Radius.Security/secrets type to deploy secrets using recipe execution.

2. **Backend (async)**: Reads resource from database (with sensitive data), executes recipe using the data
3. **Redaction**: After successful recipe execution, nullifies sensitive data fields
4. **Final save**: Stores redacted resource back to database

Copy link
Contributor

@nithyatsu nithyatsu Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, can the secret be used again for deployment? I am guessing no, since the final save redacts the content. If this is correct, does it mean we canot use "existing" keyword with these resource types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can use this again for deployment. it will be a new request. for an usecase using the keyword 'existing' as well we should be able to use it. The only data we will use from the resource would be in outputvariables which is not redacted.


### Non goals

1. **Prevent all temporary storage**: The design accepts that sensitive data will temporarily exist in the database during recipe execution (seconds to minutes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to avoid it? i.e. store only in memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type Radius.Security/secrets is an RRT (Radius Resource Type) which uses the existing pipeline for all RRTs for recipe deployment which is async based (we call TF/ARM to help provision resources)

Copy link
Member

@brooke-hamilton brooke-hamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:

**Reasons for rejection:
- **Cannot support recipes**: Recipe execution happens in backend async operation and needs access to sensitive data, this would break the recipe-based deployment.

#### Option 2: Backend Post-Recipe Redaction (Recommended)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this option cover a scenario in which a user deploys a resource that references a secret, and then deploys it again with an update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this should apply. I tested this for the scenario where there is a successful deployment and the the secret is updated and the resource is deployed again.

- **Redaction Logic**: New functionality to detect `Radius.Security/secrets` type and nullify `properties.data`
- **Cleanup Handlers**: Defer blocks ensuring redaction even on failures

### Architecture Diagram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super helpful diagram


### Option 3: Database Encryption at Rest (Recommended)

**Description**: Recommend customers enable encryption at the infrastructure/database layer to protect sensitive data during the temporary exposure window (T0-T4). Both Azure AKS and AWS EKS provide encryption at rest for their etcd databases by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this only be applicable to Azure/AWS or is there a way to implement it for local deployments as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ln194 has a link to reference docs


### Non goals

1. **Prevent all temporary storage**: The design accepts that sensitive data will temporarily exist in the database during recipe execution (seconds to minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this temporary step is needed? I would move away from storing it in database altogether.

"properties": {
"environment": "/planes/radius/local/resourceGroups/my-rg/providers/Applications.Core/environments/myenv",
"kind": "generic",
"data": null, // ← Sensitive data removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need the empty data field here or should we just remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a required property in the schema for Radius.Security/secrets type. we will not be able to work with it if it doesn't conform to the schema. Change in schema may be an option if we feel strongly about it. Will need PMs input if we want to make it optional.


### Architecture Diagram

```mermaid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so it is stored in the db only if user decides to provide secrets in plaintext? Not ideal but at least this way users have a way to provide secrets as reference to a remote storage rather than plaintext?

**Scenario 2: Redaction fails during cleanup**
- Error: Nullification logic encounters unexpected error
- Handling: Log error but do not fail the overall operation since recipe already succeeded
- User Experience: Recipe succeeds, warning logged about redaction issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the user expected to go redact or delete the sensitive data from the db manually? Should there be retries until the data is successfully redacted?

- User Experience: Operation shows as failed, will be retried by async controller

**Scenario 4: Resource type detection fails**
- Error: Unable to determine if resource requires redaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Why would Radius be unable to determine if resource requires redaction?

**Disadvantages**:
- Not automatic, customers must configure if not available by default.

Document encryption requirement in Radius installation guides
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to update the k8s cluster creation scripts here to include the encryption enablement:
https://docs.radapp.io/guides/operations/kubernetes/overview/#supported-kubernetes-clusters


1. **Frontend (sync)**: Stores full resource including sensitive data in database, queues async operation
2. **Backend (async)**: Reads resource from database (with sensitive data), executes recipe using the data
3. **Redaction**: After successful recipe execution, nullifies sensitive data fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if a user tries to read the resource before redaction has happened? Will we end up exposing sensitive data in plan text or do we skip returning data as a part of get on secret resource type?

1. **Frontend (sync)**: Stores full resource including sensitive data in database, queues async operation
2. **Backend (async)**: Reads resource from database (with sensitive data), executes recipe using the data
3. **Redaction**: After successful recipe execution, nullifies sensitive data fields
4. **Final save**: Stores redacted resource back to database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share details on how retries are handled if the final save fails?


The design introduces a redaction mechanism that operates in the backend async controller after recipe execution. The flow ensures that:

1. **Frontend (sync)**: Stores full resource including sensitive data in database, queues async operation
Copy link
Contributor

@kachawla kachawla Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload is not stored in the queue, is that correct?


> **Issue Reference:** Implementation of sensitive data handling for Radius.Security/secrets resource type to prevent long-term storage of plaintext secrets in Radius database. ref: https://github.com/radius-project/radius/issues/10421

### Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you talk a bit about the work here enabling use cases where Radius.Security/secrets can be referenced by other recipes to read and use secret values? e.g. containers recipe connecting to a secrets resource for mounting the secrets as a volume and reading the secret value to inject env vars, etc.

Comment on lines +106 to +107
- **Recipe Controller** (`pkg/portableresources/backend/controller/createorupdateresource.go`): Orchestrates recipe execution and resource updates
- **Redaction Logic**: New functionality to detect `Radius.Security/secrets` type and nullify `properties.data`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if recipe execution fails?

Copy link
Contributor

@zachcasper zachcasper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall thoughts on the secret storage:

  • We cannot store any secret data unencrypted ever. That's a non-negotiable. It doesn't matter if the data at rest is encrypted underneath Radius. We should not even store secret data in memory unencrypted.
  • This conversation is more than just the Secrets Resource Type. We should abstract this to all properties decorated with @ssecure, not just secrets. Any @secure property on any resource type should be treated with the method we discussed. The secret Resource Type is just a way to store @secure properties into a secret store.
  • If Radius needs to store the secret data temporarily for async processing, then it needs its own proper secret store for internal use.

I see three paths forward:

  • Option 1: Encrypt the secret data and only store it in memory within the Radius control plane. We would need a central component to mediate the secret data between async processes.
  • Option 2: Store the secret data temporarily in a Kubernetes secret.
  • Option 3: Scrap the recipe-based secret Resource Type and implement the storage and retrieval of secret data directly in Radius like we have today. But we will need to support HashiCorp Vault, Azure Key Vault, AWS Secrets Manager, and Kubernetes Secret. Perhaps we have a SecretsSettings Resource Type which configures where Radius stored and retrieved secrets.

Signed-off-by: lakshmimsft <[email protected]>
- Use OpenAPI extension annotations (e.g., `x-radius-sensitive`/ `x-radius-secure`) in resource type definitions
- Dynamic-rp reads schema annotations to determine which resources need encryption and which fields will need redaction
- Type schema declares its sensitivity requirements
- More flexible for future resource types without code changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this mean that users could author a type in the future that has secrets? would we want them to?

**Description**: Implement encryption/decryption of sensitive data at the application layer using Go crypto APIs with ChaCha20-Poly1305 cipher and per-request nonce.

**Implementation Details:**
- Store root key in a Kubernetes secret (similar to ucp-cert)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key Rotation: If you only have one static root key, you cannot rotate it without downtime or a complex migration. If that key is compromised, all historical data is compromised.

- Check resource type string in dynamic-rp backend controller
- Explicitly detect known sensitive resource types

**Option B: Annotation-Based (Extensible)**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this the most, and doing now ensures we don't have to unpack and re-implement later.

- Check resource type string in dynamic-rp backend controller
- Explicitly detect known sensitive resource types

**Option B: Annotation-Based (Extensible)**
Copy link
Member

@DariuszPorowski DariuszPorowski Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Annotation-Based" system must be integrated into logging middleware. Any field marked x-radius-sensitive must be automatically redacted by default (***) when the struct is marshaled for logging.

- Store root key in a Kubernetes secret (similar to ucp-cert)
- Use Go's `crypto/cipher` package with ChaCha20-Poly1305 AEAD cipher
- Generate unique nonce per encryption operation
- Encrypt sensitive data before database save, decrypt when needed for recipe execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we detecting sensitive parts from the payload?


Document encryption requirement in Radius installation guides

### Option 4: Application-Level Encryption in Dynamic-RP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point is the data being deleted from the database and how in either of the suggested options?


### Option 4: Application-Level Encryption in Dynamic-RP

**Description**: Implement encryption/decryption of sensitive data at the application layer using Go crypto APIs with ChaCha20-Poly1305 cipher and per-request nonce.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should leverage the "Associated Data" (AD) feature.

- Managing writes on pod restarts and scaling complexity with multiple pods
- Kubernetes secrets provide better access controls via RBAC

### Option 5: Type-Specific vs Annotation-Based Encryption + Redaction
Copy link
Contributor

@kachawla kachawla Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enforce database TTL (or lease in case of etcd)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the record in the db for the resource. only the sensitive data is planned to be encrypted and deleted so TTL will not apply to this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants