-
Notifications
You must be signed in to change notification settings - Fork 17
Design for Terraform/Bicep Settings #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Design for Terraform/Bicep Settings #117
Conversation
ee39224 to
963f6b0
Compare
|
|
||
| #### User Story 1 - Terraform lifecycle | ||
|
|
||
| A platform engineer runs `rad terraform install --version 1.6.4 --wait` to seed the control plane with the organization's pinned Terraform build. The installer async handler downloads from the internal mirror, validates the checksum, writes metadata, and exposes status. A follow-up `rad terraform install --version 1.7.0` automatically queues behind the first job and runs after it completes. The engineer confirms success with `rad terraform status` and re-runs `rad terraform validate --environment prod-east` (Phase 2) before dispatching recipe executions. Result: the control plane holds a single active Terraform version at any time, and sequential installs guard against race conditions or partial upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can talk about how we want this to be done.
963f6b0 to
eae884e
Compare
| - Introduce an installer async pipeline and CLI-driven Terraform binary lifecycle (`rad terraform install|uninstall|status`) with operator control over version, source URL, and checksum. | ||
| - Allow Terraform settings (provider mirrors, credentials, env vars, backend blocks) to flow through unchanged so Radius is unopinionated about Terraform configuration. | ||
| - Deliver Phase 1 with the existing Kubernetes backend, and stage AzureRM/S3 backend support as Phase 2 follow-up while keeping Tier-2 backends on the roadmap (for example `oss`, `gcs`, `http`, `oci`, `pg`, `cos`). | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the terraform still installed into app core rp and dynamic rp pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, yes. But, once this feature is out, no, it is not going to be installed.
| name: 'corpTerraformSettings' | ||
| properties: { | ||
| terraformrc: { | ||
| provider_installation: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if using underscores in property names is the standard in Bicep. cc/ @zachcasper
I will probably need to change them.
| #### User Story 1 - Terraform lifecycle | ||
|
|
||
| A platform engineer runs `rad terraform install --version 1.6.4 --wait` to seed the control plane with the organization's pinned Terraform build. The installer async handler downloads from the internal mirror, validates the checksum, writes metadata, and exposes status. A follow-up `rad terraform install --version 1.7.0` automatically queues behind the first job and runs after it completes. The engineer confirms success with `rad terraform status` before dispatching recipe executions. Result: the control plane holds a single active Terraform version at any time, and sequential installs guard against race conditions or partial upgrades. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these commands need a pod restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the pods wouldn't need to be restarted.
|
|
||
| credentials: { | ||
| 'app.terraform.io': { | ||
| secret: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| secret: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token' | |
| token: '/planes/radius/local/providers/Radius.Security/secrets/tfc-token' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also going to need a key. Not sure how that should be modeled exactly. We should discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't app.terraform.io be the key? Can you explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
credentials: {
'app.terraform.io': {
token: {
secretId: mySecret.id
key: 'myKey'
}
}
}
| ### CLI Design (if applicable) | ||
|
|
||
| - `rad terraform install [--version|--url|--checksum]` _(required by feature spec)_ | ||
| - `rad terraform uninstall [--version]` _(required by feature spec)_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `rad terraform uninstall [--version]` _(required by feature spec)_ | |
| - `rad terraform uninstall` _(required by feature spec)_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a rad terraform uninstall --version infers there are multiple versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right. I will update the doc.
| - New ARM resources `Radius.Core/terraformSettings` and `Radius.Core/bicepSettings` (preview `2025-08-01`). | ||
| - Installer REST endpoints: | ||
| - `POST /installer/terraform/install` `{ "version": "1.6.4", "source": {...} }` | ||
| - `POST /installer/terraform/uninstall` `{ "version": "1.5.7" }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on line 213
| } | ||
|
|
||
| logging: { | ||
| level: 'TRACE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to see this here. Is there any implementation we need to have to enable trace logging? I remember it was not as straightforward as setting TF_LOG to trace, but I don't remember details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it wasn't straightforward. I will need to do a deeper investigation.
|
|
||
| - Environments (`Radius.Core/environments`) reference `terraformSettings` / `bicepSettings` resources. | ||
| - `Radius.Core/environments` controller consumes the new settings resources exclusively, while the legacy `Applications.Core/environments` controller continues to serve existing `recipeConfig` callers until that surface is retired. | ||
| - Installer REST endpoint stores install/uninstall requests in a dedicated async queue (`pkg/components/queue/queueprovider`) configured with single-flight semantics. The installer async handler (running inside the existing worker service) consumes jobs sequentially, manages binaries on the shared Terraform storage, and updates status metadata. (The Helm chart will drop the old init-container download path in favour of this queue-driven workflow.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are resources deployed with a certain tf version, and then the tf is upgraded/ a new version installed, would we still be able to manage the resources installed with older version? Do we need to track the version used for deploying resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TF should be backwards compatible, right? I think at that point it is up to the platform engineer who is deciding to do the update. I don't think it is Radius' responsibility to warn the user.
|
|
||
| - Secrets stay in `Radius.Security/secrets`; we only fetch them at runtime and never write the values to disk or logs. | ||
| - Installer downloads use HTTPS, and operators can supply custom CA bundles when needed. | ||
| - Only authenticated callers can hit the installer REST/CLI entry points; no new identities are introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify what authentication methods/checks this is referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that you should only be able to access it using rad CLI. There will be no additional authentication mechanism here since authentication and authorization of Radius will be handled separately in future.
I can update this sentence accordingly.
|
|
||
| #### Portable Resources / Recipes RP (if applicable) | ||
|
|
||
| - Terraform driver consumes `terraformSettings` data for `.terraformrc`, backend config, env vars, and logging. Secret injection for custom providers continues via recipe parameters referencing Radius Secrets; no sensitive values are persisted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terraform driver would also consume terraformSettings in order to auth into private git repos for recipes, correct? i.e. this functionality: https://docs.radapp.io/guides/recipes/terraform/howto-private-registry/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
brooke-hamilton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
|
|
||
| - Environments (`Radius.Core/environments`) reference `terraformSettings` / `bicepSettings` resources. | ||
| - `Radius.Core/environments` controller consumes the new settings resources exclusively, while the legacy `Applications.Core/environments` controller continues to serve existing `recipeConfig` callers until that surface is retired. | ||
| - Installer REST endpoint stores install/uninstall requests in a dedicated async queue (`pkg/components/queue/queueprovider`) configured with single-flight semantics. The installer async handler (running inside the existing worker service) consumes jobs sequentially, manages binaries on the shared Terraform storage, and updates status metadata. (The Helm chart will drop the old init-container download path in favour of this queue-driven workflow.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be a breaking change for existing installations that attempt to do a Radius upgrade?
Should we version the helm chart? (I'm not sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will probably be a breaking change. Think about Radius users who are already using Terraform that is installed with the initContainers. They will be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand what you mean by versioning the Helm chart. When we release this feature in an upcoming release the Helm chart version will already be that release version.
| - CLI invokes installer APIs for install/uninstall/status; Phase 2 may add a `validate` call that reuses the same infrastructure when preflight checks are implemented. | ||
| - TerraformSettings serializer covers `.terraformrc` (provider mirrors, credentials, env vars) and backend blocks. Phase 1 keeps the existing Kubernetes backend adapter; AzureRM/S3 adapters plug in during Phase 2, with other backends passed through without managed auth until prioritized. | ||
| - Migration guidance stays in the Phase 2 backlog: controllers emit warnings while `recipeConfig` remains, and documentation will walk operators through removing the legacy fields. | ||
| - Recipe execution resolves the pinned binary path via the stored metadata, preserving multi-tenant isolation so environments can run different Terraform versions without interference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pause recipe deployments while Terraform installation operations are running? This would obviously not be a problem on small clusters with few users, but could be a problem at scale.
On the other hand, maybe we should defer synchronization to look at all the scenarios in which we may need to serialize operations. Are there other scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phase 2 ?
| - Reworking Bicep execution beyond registry authentication parity. | ||
| - Adding new capabilities for legacy `Applications.Core/environments`. | ||
| - Changing recipe parameter or SecretStore semantics; env secret injection continues via recipe parameters. | ||
| - Modifying Bicep runtime behavior or the bundled Bicep CLI; only registry authentication moves into `bicepSettings`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement bicep installation in the same way that Terraform installation is specified in this design? Or plan the work for future?
|
|
||
| ## Overview | ||
|
|
||
| We will externalize Radius Terraform and Bicep recipe configuration into dedicated settings resources, centralize Terraform binary lifecycle, and let platform teams supply Terraform settings exactly as they do today. This keeps Radius orchestration intact while removing opinionated guardrails that block mature Terraform estates. The work is anchored to the feature spec [`2025-08-14-terraform-bicep-settings.md`](../features/2025-08-14-terraform-bicep-settings.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this design setting the standard for future, pluggable recipe engines? Is there a consideration for how we might make recipe engines pluggable instead of hard-coded?
|
|
||
| ## Compatibility (Optional) | ||
|
|
||
| - `Applications.Core/environments` keep working during migration; we emit warnings when legacy `recipeConfig` is still in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we emit warnings when legacy
recipeConfigis still in use
good idea
| #### User Story 2 - Migrating settings | ||
|
|
||
| Another engineer owns an environment that still uses `recipeConfig`. They deploy a new `terraformSettings` resource mirroring their existing `.terraformrc`, backend block, and env vars, plus a `bicepSettings` resource for private registries. After updating the environment to reference the new resources, the controller emits warnings and telemetry while legacy fields remain, confirming the new settings path is active. Recipes keep running with no downtime, and operators can remove the old configuration once their automation is updated. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would migration work and state file be affected? if currently in k8s secret, with new changes, say, config indicates it will be AzureRM/S3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rad terraform migrate not in the scope
kachawla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed this, could you point me to how environment references the settings resource? Are there any default settings if no separate resource is defined?
|
|
||
| **Terraform** | ||
|
|
||
| - Add `Radius.Core/terraformSettings` resources and wire `Radius.Core/environments` (the new environment type) to reference reusable configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this resource be referenced from multiple environments?
|
|
||
| ## Terms and Definitions | ||
|
|
||
| - **TerraformSettings**: New `Radius.Core/terraformSettings` resource encapsulating `.terraformrc`, backend, environment, and logging settings. Migrates everything currently in `recipeConfig.terraform` (provider mirrors/credentials, backend blocks, `env` variables, trace logging flags) while shifting env secret injection to recipe parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to separate the settings out in its own resource?
| awsIrsa: { | ||
| roleArn: 'arn:aws:iam::123456789012:role/RadiusBicepModules' | ||
| secret: '/planes/radius/local/providers/Radius.Security/secrets/aws-irsa-token' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens are refreshed priodically and mounted as volume today. not quite sure how we map it to a resource.
| **Terraform settings example** | ||
|
|
||
| ```bicep | ||
| resource corpTerraformSettings 'Radius.Core/terraformSettings@2025-08-01-preview' = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to list the environments terraformSettings resource is being referenced? like we do it in recipePacks resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call -- only delete if there are no references...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rad app delete, rad env delete, and rad group delete should delete terraformSettings and bicepSettings
| - TerraformSettings serializer covers `.terraformrc` (provider mirrors, credentials, env vars) and backend blocks. Phase 1 keeps the existing Kubernetes backend adapter; AzureRM/S3 adapters plug in during Phase 2, with other backends passed through without managed auth until prioritized. | ||
| - Migration guidance stays in the Phase 2 backlog: controllers emit warnings while `recipeConfig` remains, and documentation will walk operators through removing the legacy fields. | ||
| - Recipe execution resolves the pinned binary path via the stored metadata, preserving multi-tenant isolation so environments can run different Terraform versions without interference. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there updates to recipe execution that need to be listed here. eg. we create new folders per recipe execution. where will that occur now if not in applications-rp ??
Signed-off-by: ytimocin <[email protected]>
15fcac9 to
84c3720
Compare
Design for Terraform/Bicep Settings