Skip to content

Conversation

@ytimocin
Copy link
Contributor

@ytimocin ytimocin commented Oct 13, 2025

Design for Terraform/Bicep Settings

@ytimocin ytimocin requested review from a team as code owners October 13, 2025 21:30
@ytimocin ytimocin force-pushed the ytimocin/terraform-bicep-settings branch 4 times, most recently from ee39224 to 963f6b0 Compare October 16, 2025 17:35

#### User Story 1 - Terraform lifecycle

A platform engineer runs `rad terraform install --version 1.6.4 --wait` to seed the control plane with the organization's pinned Terraform build. The installer async handler downloads from the internal mirror, validates the checksum, writes metadata, and exposes status. A follow-up `rad terraform install --version 1.7.0` automatically queues behind the first job and runs after it completes. The engineer confirms success with `rad terraform status` and re-runs `rad terraform validate --environment prod-east` (Phase 2) before dispatching recipe executions. Result: the control plane holds a single active Terraform version at any time, and sequential installs guard against race conditions or partial upgrades.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can talk about how we want this to be done.

@ytimocin ytimocin force-pushed the ytimocin/terraform-bicep-settings branch from 963f6b0 to eae884e Compare October 16, 2025 17:51
- Introduce an installer async pipeline and CLI-driven Terraform binary lifecycle (`rad terraform install|uninstall|status`) with operator control over version, source URL, and checksum.
- Allow Terraform settings (provider mirrors, credentials, env vars, backend blocks) to flow through unchanged so Radius is unopinionated about Terraform configuration.
- Deliver Phase 1 with the existing Kubernetes backend, and stage AzureRM/S3 backend support as Phase 2 follow-up while keeping Tier-2 backends on the roadmap (for example `oss`, `gcs`, `http`, `oci`, `pg`, `cos`).

Copy link
Contributor

@nithyatsu nithyatsu Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the terraform still installed into app core rp and dynamic rp pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, yes. But, once this feature is out, no, it is not going to be installed.

name: 'corpTerraformSettings'
properties: {
terraformrc: {
provider_installation: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if using underscores in property names is the standard in Bicep. cc/ @zachcasper

I will probably need to change them.

#### User Story 1 - Terraform lifecycle

A platform engineer runs `rad terraform install --version 1.6.4 --wait` to seed the control plane with the organization's pinned Terraform build. The installer async handler downloads from the internal mirror, validates the checksum, writes metadata, and exposes status. A follow-up `rad terraform install --version 1.7.0` automatically queues behind the first job and runs after it completes. The engineer confirms success with `rad terraform status` before dispatching recipe executions. Result: the control plane holds a single active Terraform version at any time, and sequential installs guard against race conditions or partial upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these commands need a pod restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the pods wouldn't need to be restarted.


credentials: {
'app.terraform.io': {
secret: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
secret: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token'
token: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also going to need a key. Not sure how that should be modeled exactly. We should discuss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't app.terraform.io be the key? Can you explain more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credentials: {
  'app.terraform.io': {
    token: {
      secretId: mySecret.id
      key: 'myKey'
    }
  }
}

### CLI Design (if applicable)

- `rad terraform install [--version|--url|--checksum]` _(required by feature spec)_
- `rad terraform uninstall [--version]` _(required by feature spec)_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `rad terraform uninstall [--version]` _(required by feature spec)_
- `rad terraform uninstall` _(required by feature spec)_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a rad terraform uninstall --version infers there are multiple versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right. I will update the doc.

- New ARM resources `Radius.Core/terraformSettings` and `Radius.Core/bicepSettings` (preview `2025-08-01`).
- Installer REST endpoints:
- `POST /installer/terraform/install` `{ "version": "1.6.4", "source": {...} }`
- `POST /installer/terraform/uninstall` `{ "version": "1.5.7" }`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on line 213

}

logging: {
level: 'TRACE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to see this here. Is there any implementation we need to have to enable trace logging? I remember it was not as straightforward as setting TF_LOG to trace, but I don't remember details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it wasn't straightforward. I will need to do a deeper investigation.


- Environments (`Radius.Core/environments`) reference `terraformSettings` / `bicepSettings` resources.
- `Radius.Core/environments` controller consumes the new settings resources exclusively, while the legacy `Applications.Core/environments` controller continues to serve existing `recipeConfig` callers until that surface is retired.
- Installer REST endpoint stores install/uninstall requests in a dedicated async queue (`pkg/components/queue/queueprovider`) configured with single-flight semantics. The installer async handler (running inside the existing worker service) consumes jobs sequentially, manages binaries on the shared Terraform storage, and updates status metadata. (The Helm chart will drop the old init-container download path in favour of this queue-driven workflow.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are resources deployed with a certain tf version, and then the tf is upgraded/ a new version installed, would we still be able to manage the resources installed with older version? Do we need to track the version used for deploying resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TF should be backwards compatible, right? I think at that point it is up to the platform engineer who is deciding to do the update. I don't think it is Radius' responsibility to warn the user.


- Secrets stay in `Radius.Security/secrets`; we only fetch them at runtime and never write the values to disk or logs.
- Installer downloads use HTTPS, and operators can supply custom CA bundles when needed.
- Only authenticated callers can hit the installer REST/CLI entry points; no new identities are introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify what authentication methods/checks this is referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that you should only be able to access it using rad CLI. There will be no additional authentication mechanism here since authentication and authorization of Radius will be handled separately in future.

I can update this sentence accordingly.


#### Portable Resources / Recipes RP (if applicable)

- Terraform driver consumes `terraformSettings` data for `.terraformrc`, backend config, env vars, and logging. Secret injection for custom providers continues via recipe parameters referencing Radius Secrets; no sensitive values are persisted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform driver would also consume terraformSettings in order to auth into private git repos for recipes, correct? i.e. this functionality: https://docs.radapp.io/guides/recipes/terraform/howto-private-registry/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

@brooke-hamilton brooke-hamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀


- Environments (`Radius.Core/environments`) reference `terraformSettings` / `bicepSettings` resources.
- `Radius.Core/environments` controller consumes the new settings resources exclusively, while the legacy `Applications.Core/environments` controller continues to serve existing `recipeConfig` callers until that surface is retired.
- Installer REST endpoint stores install/uninstall requests in a dedicated async queue (`pkg/components/queue/queueprovider`) configured with single-flight semantics. The installer async handler (running inside the existing worker service) consumes jobs sequentially, manages binaries on the shared Terraform storage, and updates status metadata. (The Helm chart will drop the old init-container download path in favour of this queue-driven workflow.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be a breaking change for existing installations that attempt to do a Radius upgrade?

Should we version the helm chart? (I'm not sure.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably be a breaking change. Think about Radius users who are already using Terraform that is installed with the initContainers. They will be affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand what you mean by versioning the Helm chart. When we release this feature in an upcoming release the Helm chart version will already be that release version.

- CLI invokes installer APIs for install/uninstall/status; Phase 2 may add a `validate` call that reuses the same infrastructure when preflight checks are implemented.
- TerraformSettings serializer covers `.terraformrc` (provider mirrors, credentials, env vars) and backend blocks. Phase 1 keeps the existing Kubernetes backend adapter; AzureRM/S3 adapters plug in during Phase 2, with other backends passed through without managed auth until prioritized.
- Migration guidance stays in the Phase 2 backlog: controllers emit warnings while `recipeConfig` remains, and documentation will walk operators through removing the legacy fields.
- Recipe execution resolves the pinned binary path via the stored metadata, preserving multi-tenant isolation so environments can run different Terraform versions without interference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pause recipe deployments while Terraform installation operations are running? This would obviously not be a problem on small clusters with few users, but could be a problem at scale.

On the other hand, maybe we should defer synchronization to look at all the scenarios in which we may need to serialize operations. Are there other scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phase 2 ?

- Reworking Bicep execution beyond registry authentication parity.
- Adding new capabilities for legacy `Applications.Core/environments`.
- Changing recipe parameter or SecretStore semantics; env secret injection continues via recipe parameters.
- Modifying Bicep runtime behavior or the bundled Bicep CLI; only registry authentication moves into `bicepSettings`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we implement bicep installation in the same way that Terraform installation is specified in this design? Or plan the work for future?


## Overview

We will externalize Radius Terraform and Bicep recipe configuration into dedicated settings resources, centralize Terraform binary lifecycle, and let platform teams supply Terraform settings exactly as they do today. This keeps Radius orchestration intact while removing opinionated guardrails that block mature Terraform estates. The work is anchored to the feature spec [`2025-08-14-terraform-bicep-settings.md`](../features/2025-08-14-terraform-bicep-settings.md).
Copy link
Member

@brooke-hamilton brooke-hamilton Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this design setting the standard for future, pluggable recipe engines? Is there a consideration for how we might make recipe engines pluggable instead of hard-coded?


## Compatibility (Optional)

- `Applications.Core/environments` keep working during migration; we emit warnings when legacy `recipeConfig` is still in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we emit warnings when legacy recipeConfig is still in use

good idea

#### User Story 2 - Migrating settings

Another engineer owns an environment that still uses `recipeConfig`. They deploy a new `terraformSettings` resource mirroring their existing `.terraformrc`, backend block, and env vars, plus a `bicepSettings` resource for private registries. After updating the environment to reference the new resources, the controller emits warnings and telemetry while legacy fields remain, confirming the new settings path is active. Recipes keep running with no downtime, and operators can remove the old configuration once their automation is updated.

Copy link
Contributor

@lakshmimsft lakshmimsft Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would migration work and state file be affected? if currently in k8s secret, with new changes, say, config indicates it will be AzureRM/S3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rad terraform migrate not in the scope

Copy link
Contributor

@kachawla kachawla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed this, could you point me to how environment references the settings resource? Are there any default settings if no separate resource is defined?


**Terraform**

- Add `Radius.Core/terraformSettings` resources and wire `Radius.Core/environments` (the new environment type) to reference reusable configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this resource be referenced from multiple environments?


## Terms and Definitions

- **TerraformSettings**: New `Radius.Core/terraformSettings` resource encapsulating `.terraformrc`, backend, environment, and logging settings. Migrates everything currently in `recipeConfig.terraform` (provider mirrors/credentials, backend blocks, `env` variables, trace logging flags) while shifting env secret injection to recipe parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to separate the settings out in its own resource?

awsIrsa: {
roleArn: 'arn:aws:iam::123456789012:role/RadiusBicepModules'
secret: '/planes/radius/local/providers/Radius.Security/secrets/aws-irsa-token'
}
Copy link
Contributor

@nithyatsu nithyatsu Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens are refreshed priodically and mounted as volume today. not quite sure how we map it to a resource.

**Terraform settings example**

```bicep
resource corpTerraformSettings 'Radius.Core/terraformSettings@2025-08-01-preview' = {
Copy link
Contributor

@vishwahiremat vishwahiremat Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to list the environments terraformSettings resource is being referenced? like we do it in recipePacks resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- only delete if there are no references...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rad app delete, rad env delete, and rad group delete should delete terraformSettings and bicepSettings

- TerraformSettings serializer covers `.terraformrc` (provider mirrors, credentials, env vars) and backend blocks. Phase 1 keeps the existing Kubernetes backend adapter; AzureRM/S3 adapters plug in during Phase 2, with other backends passed through without managed auth until prioritized.
- Migration guidance stays in the Phase 2 backlog: controllers emit warnings while `recipeConfig` remains, and documentation will walk operators through removing the legacy fields.
- Recipe execution resolves the pinned binary path via the stored metadata, preserving multi-tenant isolation so environments can run different Terraform versions without interference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there updates to recipe execution that need to be listed here. eg. we create new folders per recipe execution. where will that occur now if not in applications-rp ??

@ytimocin ytimocin force-pushed the ytimocin/terraform-bicep-settings branch from 15fcac9 to 84c3720 Compare November 20, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants