feat: add opt-in init-less pod layout (beta)#16161
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR introduces an opt-in, beta init-less pod layout for Argo Workflows where a ChangesInit-less Pod Implementation
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
workflow/controller/operator.go (1)
1813-1825:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate fallback failure messages to be init-less aware.
The fallback text still says
wait container, which is inaccurate in init-less mode and makes debugging harder when onlysupervisorexists.Suggested patch
- // Determine final status based on whether we confirmed main and wait succeeded + // Determine final status based on whether we confirmed main and auxiliary container succeeded // Slightly convulted approach to avoid the exhaustive linter getting upset if mainContainerSucceeded { if waitContainerSucceeded { // Both succeeded - sidecars may have been force-killed (137/143), which is fine return wfv1.NodeSucceeded, "" } - return wfv1.NodeFailed, "pod failed: wait container did not complete successfully" + return wfv1.NodeFailed, "pod failed: auxiliary container (wait/supervisor) did not complete successfully" } if waitContainerSucceeded { return wfv1.NodeFailed, "pod failed: main container did not complete successfully" } - return wfv1.NodeFailed, "pod failed: neither main nor wait container completed successfully" + return wfv1.NodeFailed, "pod failed: neither main nor auxiliary container (wait/supervisor) completed successfully"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/controller/operator.go` around lines 1813 - 1825, The fallback failure messages referencing "wait container" are inaccurate in init-less mode; update the strings returned in the branch handling mainContainerSucceeded and waitContainerSucceeded (the code using mainContainerSucceeded, waitContainerSucceeded, and returning wfv1.NodeFailed) to use init-less aware wording such as "wait/supervisor container" or "supervisor (init-less) container" so they correctly describe which container failed in both normal and init-less modes; keep the same return values (wfv1.NodeFailed and the message) and only change the human-readable text.
🧹 Nitpick comments (3)
workflow/controller/workflowpod_initless_test.go (2)
83-90: ⚡ Quick winAssert the exact
inputPluginsset here.The map-based membership checks still pass if
buildPluginSidecars()returns duplicate input plugin names, so this test can miss the regression it's meant to guard. Add an exact cardinality/set assertion before the membership checks.Example tightening
inputNames := map[wfv1.ArtifactPluginName]bool{} for _, n := range inputPlugins { inputNames[n] = true } + require.Len(t, inputPlugins, 2) + require.Len(t, inputNames, 2) assert.True(t, inputNames["shared-plugin"]) assert.True(t, inputNames["only-input"]) assert.False(t, inputNames["only-output"], "only-output must not appear in input plugin list")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/controller/workflowpod_initless_test.go` around lines 83 - 90, The test currently only checks membership via the inputNames map which won't catch duplicates; update the test around inputPlugins/inputNames to assert the exact set/cardinality of inputPlugins (e.g. assert len(inputPlugins) == 2 and that the set equals {"shared-plugin","only-input"}) before the individual membership checks; reference the inputPlugins variable (and the buildPluginSidecars output that populates it) and then keep the existing assertions against inputNames for clarity.
180-322: ⚡ Quick winAdd a regression case with user-defined
initContainers.These shape tests only cover workflows that start with no user init containers, so they do not lock down one of the core contracts of the feature: init-less mode should remove only the injected
initcontainer, while preserving user-specified init containers. Please add one legacy and one init-less case that start with a user init container and assert the expected final pod shape.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/controller/workflowpod_initless_test.go` around lines 180 - 322, Add two tests mirroring TestCreateWorkflowPod_InitlessShape and TestCreateWorkflowPod_LegacyShape but start the template with a user-defined init container (e.g. add tmpl.InitContainers = []apiv1.Container{{Name: "user-init", Image: "busybox"}} before calling woc.createWorkflowPod). In the init-less variant (similar to TestCreateWorkflowPod_InitlessShape) assert that pod.Spec.InitContainers contains the user init ("user-init") and does NOT contain the injected init named common.InitContainerName, and still verify the other init-less invariants (supervisor, argo-bin volume, ARGO_WAIT_FOR_READY on main). In the legacy variant (similar to TestCreateWorkflowPod_LegacyShape) assert that pod.Spec.InitContainers contains both the user init ("user-init") and the injected common.InitContainerName, and that wait-container behavior and absence of argo-bin volume remain as in the original legacy test. Use createWorkflowPod, createWorkflowPodOpts, tmpl, and common.InitContainerName to locate where to modify and which assertions to add.workflow/util/util.go (1)
1639-1646: ⚡ Quick winPrefer deterministic legacy-first lookup in
FindAuxiliaryCtrIndex.Current logic returns whichever appears first; making
waitprecedence explicit preserves legacy semantics and avoids ambiguous matches.Suggested patch
func FindAuxiliaryCtrIndex(pod *apiv1.Pod) (int, error) { for i, ctr := range pod.Spec.Containers { - if ctr.Name == common.WaitContainerName || ctr.Name == common.SupervisorContainerName { + if ctr.Name == common.WaitContainerName { return i, nil } } + for i, ctr := range pod.Spec.Containers { + if ctr.Name == common.SupervisorContainerName { + return i, nil + } + } return -1, errors.Errorf("-1", "Could not find wait or supervisor container in pod spec") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workflow/util/util.go` around lines 1639 - 1646, FindAuxiliaryCtrIndex currently returns whichever auxiliary container is encountered first; change it to explicitly prefer common.WaitContainerName (legacy) over common.SupervisorContainerName by first scanning pod.Spec.Containers for a container with name == common.WaitContainerName and returning its index if found, then scanning a second time for common.SupervisorContainerName and returning that index if found; if neither is found, return -1 with a clear error (use errors.Errorf or errors.New with a descriptive message) so the behavior is deterministic and preserves legacy precedence.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.features/pending/initless-pod.md:
- Line 2: The author attribution line "Authors: [Alan
Clucas](https://github.com/Joibel)" mismatches name and profile; update that
single Markdown link so the display name and GitHub URL point to the same person
— either change the display text to "Joibel" to match the existing link or
change the URL to the correct Alan Clucas profile (e.g., github.com/AlanClucas)
so the "Authors" entry is unambiguous.
In `@cmd/argoexec/commands/emissary.go`:
- Around line 366-378: Don't recursively delete pre-existing artifact paths:
replace the os.RemoveAll(dst) call with os.Remove(dst) so we only remove
non-directory entries (matching legacy SubPath shadowing) and return an error if
removal fails (which will preserve real directories/volumes). Update the error
handling around the existing os.Lstat(dst) branch that references dst and
art.Name (in emissary.go) to call os.Remove(dst) and propagate any rmErr instead
of using RemoveAll.
- Around line 411-448: The two watcher goroutines currently swallow
context.Canceled by returning nil when file.WaitForCreate returns a
cancellation; change both handlers (the g.Go closures monitoring readyPath and
failedPath) to propagate the cancellation error instead of returning nil (i.e.,
return err when errors.Is(err, context.Canceled) is true or simply return err
directly), so that g.Wait() receives the parent cancellation (context.Canceled)
and the caller can exit promptly rather than treating cancellation as successful
readiness.
In `@workflow/common/common.go`:
- Around line 355-357: The problem is that user containers can be named "wait"
or "supervisor" and get misclassified as Argo sidecars; update
ContainerSetTemplate.Validate to reject reserved Argo aux container names by
checking container names against WaitContainerName and SupervisorContainerName
(and any artifact plugin sidecar names checked by IsArtifactPluginSidecar) and
return a clear validation error when a user-specified container matches one of
those reserved names; keep IsArgoSidecar as the runtime classifier but enforce
the reservation at ContainerSetTemplate.Validate (use the same symbols:
IsArgoSidecar, WaitContainerName, SupervisorContainerName,
IsArtifactPluginSidecar, and isValidWorkflowFieldName) so users cannot create
containers that will be misclassified.
In `@workflow/controller/operator_test.go`:
- Around line 7839-7849: Tighten TestInferFailedReason_SupervisorPreMainFailure
(and the similar test around lines 7882-7890) to assert that the losing main
container's signal/reason is absent: after the existing assert.Contains checks
for "supervisor" and "connection refused" (on the result of inferFailedReason),
add assert.NotContains assertions to ensure the message does NOT include the
main container's identifier or its failure text (e.g., "exit-65" / "exit code
65" or the main container name used in podWithSupervisorPreMainFailure); this
guarantees supervisor precedence rather than a merged/fallback message.
In `@workflow/controller/workflowpod.go`:
- Around line 1761-1808: The code appends input-only plugin sidecar containers
but doesn't add their corresponding Volume entries, causing pod admission to
fail; update the pod volume population so that volumes needed by init-less input
plugins are created: after calling woc.buildPluginSidecars (and using
woc.initlessPluginSidecarArtifactMounts), ensure you also generate and append
the corresponding apiv1.Volume objects (use the same logic as
createArtifactVolumeMounts or a helper that derives driver volumes from the
plugin drivers/driver.Name.VolumeMount()/driver.Name.Volume()) into
pod.Spec.Volumes before appending sidecars; reference buildPluginSidecars,
initlessPluginSidecarArtifactMounts, and createArtifactVolumeMounts to locate
where to add the volume creation and pod.Spec.Volumes = append(...) call.
---
Outside diff comments:
In `@workflow/controller/operator.go`:
- Around line 1813-1825: The fallback failure messages referencing "wait
container" are inaccurate in init-less mode; update the strings returned in the
branch handling mainContainerSucceeded and waitContainerSucceeded (the code
using mainContainerSucceeded, waitContainerSucceeded, and returning
wfv1.NodeFailed) to use init-less aware wording such as "wait/supervisor
container" or "supervisor (init-less) container" so they correctly describe
which container failed in both normal and init-less modes; keep the same return
values (wfv1.NodeFailed and the message) and only change the human-readable
text.
---
Nitpick comments:
In `@workflow/controller/workflowpod_initless_test.go`:
- Around line 83-90: The test currently only checks membership via the
inputNames map which won't catch duplicates; update the test around
inputPlugins/inputNames to assert the exact set/cardinality of inputPlugins
(e.g. assert len(inputPlugins) == 2 and that the set equals
{"shared-plugin","only-input"}) before the individual membership checks;
reference the inputPlugins variable (and the buildPluginSidecars output that
populates it) and then keep the existing assertions against inputNames for
clarity.
- Around line 180-322: Add two tests mirroring
TestCreateWorkflowPod_InitlessShape and TestCreateWorkflowPod_LegacyShape but
start the template with a user-defined init container (e.g. add
tmpl.InitContainers = []apiv1.Container{{Name: "user-init", Image: "busybox"}}
before calling woc.createWorkflowPod). In the init-less variant (similar to
TestCreateWorkflowPod_InitlessShape) assert that pod.Spec.InitContainers
contains the user init ("user-init") and does NOT contain the injected init
named common.InitContainerName, and still verify the other init-less invariants
(supervisor, argo-bin volume, ARGO_WAIT_FOR_READY on main). In the legacy
variant (similar to TestCreateWorkflowPod_LegacyShape) assert that
pod.Spec.InitContainers contains both the user init ("user-init") and the
injected common.InitContainerName, and that wait-container behavior and absence
of argo-bin volume remain as in the original legacy test. Use createWorkflowPod,
createWorkflowPodOpts, tmpl, and common.InitContainerName to locate where to
modify and which assertions to add.
In `@workflow/util/util.go`:
- Around line 1639-1646: FindAuxiliaryCtrIndex currently returns whichever
auxiliary container is encountered first; change it to explicitly prefer
common.WaitContainerName (legacy) over common.SupervisorContainerName by first
scanning pod.Spec.Containers for a container with name ==
common.WaitContainerName and returning its index if found, then scanning a
second time for common.SupervisorContainerName and returning that index if
found; if neither is found, return -1 with a clear error (use errors.Errorf or
errors.New with a descriptive message) so the behavior is deterministic and
preserves legacy precedence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f90c046-feb0-4072-a963-08138bb68d47
📒 Files selected for processing (43)
.features/pending/initless-pod.md.github/workflows/ci-build.yamlMakefilecmd/argoexec/commands/artifact_plugin_init.gocmd/argoexec/commands/artifact_plugin_sidecar.gocmd/argoexec/commands/auxiliary.gocmd/argoexec/commands/emissary.gocmd/argoexec/commands/emissary_link_artifacts_test.gocmd/argoexec/commands/emissary_read_template_test.gocmd/argoexec/commands/emissary_ready_test.gocmd/argoexec/commands/root.gocmd/argoexec/commands/signal.gocmd/argoexec/commands/supervisor.gocmd/argoexec/commands/supervisor_test.gocmd/argoexec/commands/wait.gocmd/argoexec/executor/init.goconfig/config.godocs/architecture.mddocs/initless-pod.mddocs/tracing.mddocs/workflow-controller-configmap.mdhack/k8s-versions.shmkdocs.ymltest/e2e/manifests/components/initless-pod/kustomization.yamltest/e2e/manifests/components/initless-pod/workflow-controller-configmap.yamltest/e2e/manifests/minimal-initless/kustomization.yamltest/e2e/manifests/plugins-initless/kustomization.yamlutil/file/watch.goutil/telemetry/builder/values.yamlutil/telemetry/traces_list.goworkflow/common/common.goworkflow/controller/artifact_gc.goworkflow/controller/config.goworkflow/controller/operator.goworkflow/controller/operator_test.goworkflow/controller/workflowpod.goworkflow/controller/workflowpod_initless_test.goworkflow/executor/emissary/emissary.goworkflow/executor/executor.goworkflow/executor/executor_test.goworkflow/executor/postmain.goworkflow/executor/postmain_test.goworkflow/util/util.go
c6b0e98 to
07af2cc
Compare
New controller-wide `initlessPod` mode that eliminates the `argoexec init` container. The `argoexec` binary is delivered to `main` via a Kubernetes image volume (KEP-4639 — Beta in K8s 1.33 behind a feature gate, GA in 1.36), and a new `supervisor` container subsumes the work of `init` plus `wait`: template write, script staging, input artifact download, readiness signaling, then the post-main responsibilities (observe main, collect outputs/logs/artifacts) previously held by `wait`. Artifact plugins run as a single sidecar per plugin invoked by `supervisor` for both Load and Save, collapsing the legacy split between input-init containers and output sidecars. Pods scheduled under this mode have zero init containers. Beta: off by default and may change in incompatible ways in future minor releases before being promoted to stable. Legacy `wait` + init-container behavior is unchanged when `initlessPod.enabled` is false (default). Controller (workflow/controller) - `createWorkflowPod` branches on `isInitlessPodEnabled()`: skips `pod.Spec.InitContainers` entirely, attaches `supervisor` in place of `wait`, mounts the argoexec-bin image volume on every main-level container, retargets the emissary at `/argo-bin/bin/argoexec`, sets `ARGO_WAIT_FOR_READY=true` on main-level containers, and passes `ARGO_TEMPLATE` directly to main for templates that don't run a supervisor (data / resource-without-logs). - `buildPluginSidecars` / `addArtifactPluginsInitless` emit one sidecar per unique plugin across both Inputs.Artifacts and Outputs.Artifacts, with supervisor mounting each plugin's socket volume and receiving `ARGO_ARTIFACT_PLUGIN_NAMES` (all plugins, for Save) plus `ARGO_INPUT_ARTIFACT_PLUGIN_NAMES` (the Load subset). - Input-artifact mount handling switches from per-artifact SubPath bind mounts (kubelet pre-creates SubPath entries as empty directories before supervisor can write to them) to a whole-volume mount at `/argo/inputs/artifacts` that the emissary symlinks into each artifact's expected path post-ready. - `inferFailedReason` recognises supervisor as an error-reporting auxiliary container alongside `wait`. - New `InitlessPodConfig` sub-struct in `config.Config`; reuses the existing `executor.image` / `executor.imagePullPolicy` for the image volume so the mounted binary always matches the running supervisor. Argoexec (cmd/argoexec, workflow/executor) - New `argoexec supervisor` subcommand: calls `WriteTemplate`, stages files, then loads non-plugin and per-plugin input artifacts in parallel via errgroup, writes `/var/run/argo/ready` (or `/failed` with the error text) atomically write-then-rename, and runs the shared post-main phase. - Shared post-main logic extracted from `argoexec wait` into `WorkflowExecutor.PostMain`; the caller still owns tracing and the defer stack (errHandler stays outermost). - Emissary `writeTemplate` split out from `Init`, exposed as a new `TemplateWriter` interface so the init-less supervisor can write the template without the binary-copy step. Legacy `Init` is unchanged. - Emissary gains an opt-in pre-exec wait gated on `ARGO_WAIT_FOR_READY=true`: block on the ready/failed marker (via `file.WaitForCreate` race) before reading the template, fall back to `ARGO_TEMPLATE` when `/var/run/argo/template` is absent, then symlink each input artifact from `/argo/inputs/artifacts/<name>` to `art.Path`. A failed marker maps to exit code 65 so the controller can attribute pre-main failures to supervisor rather than the user command. - `stageArchiveFile` falls back to the input-artifacts emptyDir when an output path overlaps an input artifact path — under init-less mode there is no mirrored `/mainctrfs/<art.Path>` from a SubPath bind mount. Tracing - New `RunSupervisorContainer` span as a sibling to `RunWaitContainer` under `CreateWorkflowPod`. `LoadArtifacts`, `StageFiles`, `SaveArtifacts`, `SaveLogs`, `CaptureScriptResult`, `CreateTaskResult`, `PatchTaskResult`, `PatchTaskResultLabels`, and `WaitWorkload` all accept it as a valid parent so sub-spans parent correctly in both modes. CI, manifests, docs - New `INITLESS=true|false` Make variable; when true `install` uses a `<profile>-initless` kustomize overlay layering a new `initless-pod` component on top of the profile's manifests. - New `initless: true|false` CI matrix dimension restricted to `k8s_version: max`; re-runs `test-corefunctional`, `test-functional`, `test-artifacts`, and `test-plugins` under the new layout. Bumps max K8s version from 1.35 to 1.36 (image volumes are now GA there). - New `docs/initless-pod.md` plus updates to `docs/architecture.md`, `docs/tracing.md`, `docs/workflow-controller-configmap.md`, `mkdocs.yml`, and a pending feature entry at `.features/pending/initless-pod.md`. Tests - 331-line `workflowpod_initless_test.go` covering pod-spec shape (zero init containers, supervisor present, image volume mounted, emissary entrypoint retargeted), plugin sidecar dedup across input and output references, template-env-var handling for templates without a supervisor, and legacy-mode invariance. - New emissary unit tests for marker waiting (ready-appears, failed-appears, already-ready, ctx-cancelled), template-env fallback, and input-artifact symlink behaviour. - New supervisor unit tests for ready/failed marker writes and `ARGO_INPUT_ARTIFACT_PLUGIN_NAMES` parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Alan Clucas <alan@clucas.org>
Fixes #16154
Motivation
Every workflow pod currently runs an
argoexec initcontainer before any regular container starts, adding sequential startup latency to every pod. The init container's responsibilities also overlap conceptually withwait— both are Argo-infrastructure containers runningargoexec— and a plugin used for both inputs and outputs ends up as two containers per pod (an init container for Load and a sidecar for Save).This introduces an opt-in, controller-wide
initlessPodmode (Beta, off by default) that removes the init container entirely, cutting pod startup latency and collapsing the input/output plugin split.Modifications
New controller-wide
initlessPod.enabledsetting (workflow controller ConfigMap). When enabled, every subsequently scheduled workflow pod uses the init-less layout; the legacywait+ init-container behaviour is unchanged when disabled (the default).argoexecbinary is delivered tomainvia a Kubernetes image volume (KEP-4639 — Beta in K8s 1.33 behind a feature gate, GA in 1.36) instead of being copied by an init container.supervisorcontainer subsumes init + wait: template write, script staging, input artifact download, readiness signalling, then the post-main responsibilities (observemain, collect outputs/logs/artifacts). Pods run with zero init containers.supervisorfor both Load and Save.supervisorsignals ready (per-artifact SubPath mounts race kubelet in the init-less layout, wheremainandsupervisorstart concurrently).WorkflowExecutor.PostMain, used by bothwaitandsupervisor.Enable with
initlessPod.enabled: true; roll back by setting it tofalse(in-flight pods keep their original layout).Verification
supervisorpresent, image volume mounted, emissary entrypoint retargeted), plugin sidecar dedup across input/output references, template-env handling for templates without a supervisor, ready/failed marker waiting and writing, input-artifact symlink behaviour, supervisor pre-main orchestration, and output staging when an input artifact path is reused as an output path.initless: trueCI matrix dimension (restricted tok8s_version: max, bumped 1.35 → 1.36 where image volumes are GA) re-runstest-corefunctional,test-functional,test-artifacts, andtest-pluginsunder the new layout.Documentation
docs/initless-pod.md(architecture, pod lifecycle, failure modes, the input-artifact symlink-vs-bind-mount semantics table, and plugin author notes).docs/architecture.md,docs/tracing.md,docs/workflow-controller-configmap.md, andmkdocs.yml.AI
Generative AI (Claude) was used to assist in preparing this PR — including code, tests, documentation, and commit messages — with human review throughout. See the Argo project's Generative AI policy.
Summary by CodeRabbit
Release Notes
New Features
Configuration
InitlessPod.Enabledoption in workflow controller ConfigMap to switch pod layoutsDocumentation
Tests