Skip to content

feat: add opt-in init-less pod layout (beta)#16161

Open
Joibel wants to merge 1 commit into
argoproj:mainfrom
pipekit:initless-pod
Open

feat: add opt-in init-less pod layout (beta)#16161
Joibel wants to merge 1 commit into
argoproj:mainfrom
pipekit:initless-pod

Conversation

@Joibel

@Joibel Joibel commented Jun 2, 2026

Copy link
Copy Markdown
Member

Fixes #16154

⚠️ This PR switches all e2e CI from using docker backend of k3d to containerd.

Motivation

Every workflow pod currently runs an argoexec init container before any regular container starts, adding sequential startup latency to every pod. The init container's responsibilities also overlap conceptually with wait — both are Argo-infrastructure containers running argoexec — and a plugin used for both inputs and outputs ends up as two containers per pod (an init container for Load and a sidecar for Save).

This introduces an opt-in, controller-wide initlessPod mode (Beta, off by default) that removes the init container entirely, cutting pod startup latency and collapsing the input/output plugin split.

Modifications

New controller-wide initlessPod.enabled setting (workflow controller ConfigMap). When enabled, every subsequently scheduled workflow pod uses the init-less layout; the legacy wait + init-container behaviour is unchanged when disabled (the default).

  • The argoexec binary is delivered to main via a Kubernetes image volume (KEP-4639 — Beta in K8s 1.33 behind a feature gate, GA in 1.36) instead of being copied by an init container.
  • A new supervisor container subsumes init + wait: template write, script staging, input artifact download, readiness signalling, then the post-main responsibilities (observe main, collect outputs/logs/artifacts). Pods run with zero init containers.
  • Artifact plugins run as a single sidecar per plugin, invoked by supervisor for both Load and Save.
  • Input artifacts are delivered via a whole-volume mount that the emissary symlinks into place once supervisor signals ready (per-artifact SubPath mounts race kubelet in the init-less layout, where main and supervisor start concurrently).
  • Shared post-main logic is extracted into WorkflowExecutor.PostMain, used by both wait and supervisor.

Enable with initlessPod.enabled: true; roll back by setting it to false (in-flight pods keep their original layout).

Verification

  • New unit tests covering: pod-spec shape (zero init containers, supervisor present, image volume mounted, emissary entrypoint retargeted), plugin sidecar dedup across input/output references, template-env handling for templates without a supervisor, ready/failed marker waiting and writing, input-artifact symlink behaviour, supervisor pre-main orchestration, and output staging when an input artifact path is reused as an output path.
  • New initless: true CI matrix dimension (restricted to k8s_version: max, bumped 1.35 → 1.36 where image volumes are GA) re-runs test-corefunctional, test-functional, test-artifacts, and test-plugins under the new layout.
  • Legacy-mode behaviour is held invariant by existing tests plus explicit init-less-off assertions.

Documentation

  • New docs/initless-pod.md (architecture, pod lifecycle, failure modes, the input-artifact symlink-vs-bind-mount semantics table, and plugin author notes).
  • Updates to docs/architecture.md, docs/tracing.md, docs/workflow-controller-configmap.md, and mkdocs.yml.
  • Users discover the feature through the controller ConfigMap reference and the dedicated docs page.

AI

Generative AI (Claude) was used to assist in preparing this PR — including code, tests, documentation, and commit messages — with human review throughout. See the Argo project's Generative AI policy.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced opt-in "init-less pod" layout (beta): workflow pods can run without init containers when enabled via ConfigMap
    • Added supervisor container to handle input artifact staging and output collection
    • Artifact plugins now run as sidecars instead of init containers
  • Configuration

    • New InitlessPod.Enabled option in workflow controller ConfigMap to switch pod layouts
  • Documentation

    • Added init-less pod architecture and feature documentation
    • Updated configuration guides and tracing documentation
  • Tests

    • Comprehensive test coverage for init-less pod execution, supervisor container behavior, and plugin sidecar functionality

@Joibel Joibel changed the title feat: add opt-in init-less pod layout (beta). Fixes #16154 feat: add opt-in init-less pod layout (beta) Jun 2, 2026
@Joibel

Joibel commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces an opt-in, beta init-less pod layout for Argo Workflows where a supervisor container replaces legacy init+wait responsibilities, the argoexec binary is delivered via Kubernetes image volumes, artifact plugins run as regular sidecars, and workflow pods contain zero init containers. The feature is gated by configuration, extensively tested, and fully documented with comprehensive E2E test coverage.

Changes

Init-less Pod Implementation

Layer / File(s) Summary
Configuration and Feature Gating
config/config.go, workflow/controller/config.go, docs/workflow-controller-configmap.md
InitlessPod config struct and IsEnabled() helper added; controller helper isInitlessPodEnabled() introduced to gate init-less behavior and log enablement status.
Common Constants and Helpers
workflow/common/common.go, util/file/watch.go
New container name constant SupervisorContainerName, environment variables for init-less coordination (EnvVarWaitForReady, EnvVarInitlessPod, EnvVarInputArtifactPluginNames), marker file paths, failure exit code, and helper functions (IsInitlessPod(), JoinPluginNames(), SplitPluginNames(), ResolveTemplateEnvValue()) for plugin name codec and template resolution; ModTime().Equal() comparison added for file watch.
Pod Layout Selection and Assembly
workflow/controller/workflowpod.go
Controller conditionally creates argoexec-bin image volume, selects supervisor vs wait auxiliary container, omits init containers in init-less mode, adjusts ARGO_TEMPLATE env placement per layout, validates across all containers, and updates command/mount wiring for argoexec binary path location.
Emissary and Supervisor Coordination
cmd/argoexec/commands/emissary.go, cmd/argoexec/commands/emissary_link_artifacts_test.go, cmd/argoexec/commands/emissary_read_template_test.go, cmd/argoexec/commands/emissary_ready_test.go
Emissary waits for supervisor readiness via marker files, creates symlinks for input artifacts, loads templates from file with env fallback and offload resolution, forwards signals via shared helper, and computes exit codes via unified function; five comprehensive test cases validate readiness/failure marker behavior, template precedence, and artifact linking.
Supervisor Container Implementation
cmd/argoexec/commands/supervisor.go, cmd/argoexec/commands/supervisor_test.go
New supervisor command runs pre-main setup (template writing, file staging, parallel artifact loading with errgroup), writes atomic readiness/failure markers, frees memory on success, and delegates to PostMain phase; testable runSupervisorPreMain core and test suite validate sequencing, error handling, and cancellation propagation.
Shared Auxiliary Container Lifecycle
cmd/argoexec/commands/auxiliary.go, cmd/argoexec/commands/wait.go
New runAuxiliaryContainer helper consolidates tracing, executor initialization, error handling, and stats management for both legacy wait and supervisor flows; wait command refactored to delegate to this framework.
Signal Forwarding and Process Control
cmd/argoexec/commands/signal.go, cmd/argoexec/commands/artifact_plugin_init.go, cmd/argoexec/commands/artifact_plugin_sidecar.go
New forwardSignals() helper replaces inline signal handling across multiple commands; artifact plugin init/sidecar commands refactored to use shared signal forwarding and exit code helpers; sidecar adds startAuxExitWatcher() to monitor aux container and enforce plugin termination via SIGTERM/SIGKILL escalation.
Post-Main Execution Phase
workflow/executor/postmain.go, workflow/executor/postmain_test.go
New PostMain() method on executor implements output handling, script result capture, artifact/log saves conditionally on preMainFailed flag; shared across legacy wait and supervisor layouts with four unit tests validating resource templates, happy paths, pre-main failure handling, and error aggregation.
Executor Framework Updates
cmd/argoexec/executor/init.go, workflow/executor/executor.go, workflow/executor/executor_test.go
Init() refactored for template resolution via file-first fallback and offload helper; NewExecutor returns pointer for concurrency safety; configmap memoization protected with mutex for init-less plugin loading; new TemplateWriter interface and WriteTemplate() method on executor; isBaseImagePath() updated for init-less semantics; unit tests cover init-less base-image detection and staging overlaps.
Input Artifact Delivery
workflow/controller/workflowpod.go (range af3326056807, 55ec2a441810)
Init-less mode uses shared input-artifacts emptyDir volume mounted on both supervisor and main (whole volume without per-artifact SubPath), while legacy uses per-artifact SubPath mounts; supervisor populates symlinks at artifact destinations, avoiding race conditions.
Artifact Plugin Sidecars (Init-less)
workflow/controller/workflowpod.go (ranges 87df..., 63e79..., 25d35...)
Init-less artifact plugin sidecars deduplicated across input+output artifacts, supervisor wired with env vars listing plugin names/sockets for Load and Save phases, plugin sidecars mounted with argoexec-bin and input-artifacts volumes for runtime execution.
Pod Failure Detection and Reporting
workflow/controller/operator.go, workflow/controller/operator_test.go
Auxiliary container cleanup tracking generalized for supervisor+wait; inferFailedReason() detects supervisor pre-main failures and ranks supervisor errors over main placeholder exits; two new test fixtures validate supervisor failure precedence and success scenarios.
Telemetry and Tracing Infrastructure
util/telemetry/builder/values.yaml, util/telemetry/traces_list.go
New RunSupervisorContainer span with workflow name/namespace attributes, parent relationships updated across artifact/output handling spans, generated validation updated for supervisor as acceptable parent.
Emissary Executor WriteTemplate Support
workflow/executor/emissary/emissary.go
WriteTemplate() exported from emissary for supervisor template JSON persistence; SIGKILL phase context handling switched to WithoutCancel() to preserve trace/logger context.
Unit and Integration Tests
cmd/argoexec/commands/*_test.go, workflow/controller/workflowpod_initless_test.go, workflow/executor/*_test.go
Comprehensive test coverage for supervisor command logic, emissary helpers (readiness, template, artifact linking), pod construction invariants, plugin deduplication, input artifact mounting, artifact plugin wiring, and end-to-end init-less pod shapes vs legacy regression tests.
E2E Testing and CI Infrastructure
.github/workflows/ci-build.yaml, Makefile, test/e2e/manifests/**
CI matrix extended with initless test variants for multiple suites, Makefile INITLESS flag controls profile selection ($(PROFILE)-initless vs $(PROFILE)), E2E manifests include kustomization components for init-less ConfigMap patching and fixture variants.
Feature Documentation
.features/pending/initless-pod.md, docs/initless-pod.md, docs/architecture.md, docs/tracing.md, mkdocs.yml
New pending feature file, comprehensive initless-pod.md describing motivation, configuration, pod layout, startup sequencing, failure scenarios, and input artifact delivery differences; architecture.md and tracing.md updated; MkDocs navigation includes new doc.
Supporting Utility Updates
workflow/controller/artifact_gc.go, workflow/util/util.go, hack/k8s-versions.sh
Artifact GC plugin name serialization switched to common.JoinPluginNames() helper; new FindAuxiliaryCtrIndex() utility to locate auxiliary container by name across both layouts; Kubernetes max version bumped to v1.36.0 for image volume support.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • argoproj/argo-workflows#15582: Earlier PR extending OpenTelemetry span definitions to cover the new supervisor container paths (RunSupervisorContainer span generation and parent/child relationships).

Suggested Reviewers

  • isubasinghe
  • terrytangyuan
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
workflow/controller/operator.go (1)

1813-1825: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update fallback failure messages to be init-less aware.

The fallback text still says wait container, which is inaccurate in init-less mode and makes debugging harder when only supervisor exists.

Suggested patch
-	// Determine final status based on whether we confirmed main and wait succeeded
+	// Determine final status based on whether we confirmed main and auxiliary container succeeded
 	// Slightly convulted approach to avoid the exhaustive linter getting upset
 	if mainContainerSucceeded {
 		if waitContainerSucceeded {
 			// Both succeeded - sidecars may have been force-killed (137/143), which is fine
 			return wfv1.NodeSucceeded, ""
 		}
-		return wfv1.NodeFailed, "pod failed: wait container did not complete successfully"
+		return wfv1.NodeFailed, "pod failed: auxiliary container (wait/supervisor) did not complete successfully"
 	}
 	if waitContainerSucceeded {
 		return wfv1.NodeFailed, "pod failed: main container did not complete successfully"
 	}
-	return wfv1.NodeFailed, "pod failed: neither main nor wait container completed successfully"
+	return wfv1.NodeFailed, "pod failed: neither main nor auxiliary container (wait/supervisor) completed successfully"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/controller/operator.go` around lines 1813 - 1825, The fallback
failure messages referencing "wait container" are inaccurate in init-less mode;
update the strings returned in the branch handling mainContainerSucceeded and
waitContainerSucceeded (the code using mainContainerSucceeded,
waitContainerSucceeded, and returning wfv1.NodeFailed) to use init-less aware
wording such as "wait/supervisor container" or "supervisor (init-less)
container" so they correctly describe which container failed in both normal and
init-less modes; keep the same return values (wfv1.NodeFailed and the message)
and only change the human-readable text.
🧹 Nitpick comments (3)
workflow/controller/workflowpod_initless_test.go (2)

83-90: ⚡ Quick win

Assert the exact inputPlugins set here.

The map-based membership checks still pass if buildPluginSidecars() returns duplicate input plugin names, so this test can miss the regression it's meant to guard. Add an exact cardinality/set assertion before the membership checks.

Example tightening
 	inputNames := map[wfv1.ArtifactPluginName]bool{}
 	for _, n := range inputPlugins {
 		inputNames[n] = true
 	}
+	require.Len(t, inputPlugins, 2)
+	require.Len(t, inputNames, 2)
 	assert.True(t, inputNames["shared-plugin"])
 	assert.True(t, inputNames["only-input"])
 	assert.False(t, inputNames["only-output"], "only-output must not appear in input plugin list")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/controller/workflowpod_initless_test.go` around lines 83 - 90, The
test currently only checks membership via the inputNames map which won't catch
duplicates; update the test around inputPlugins/inputNames to assert the exact
set/cardinality of inputPlugins (e.g. assert len(inputPlugins) == 2 and that the
set equals {"shared-plugin","only-input"}) before the individual membership
checks; reference the inputPlugins variable (and the buildPluginSidecars output
that populates it) and then keep the existing assertions against inputNames for
clarity.

180-322: ⚡ Quick win

Add a regression case with user-defined initContainers.

These shape tests only cover workflows that start with no user init containers, so they do not lock down one of the core contracts of the feature: init-less mode should remove only the injected init container, while preserving user-specified init containers. Please add one legacy and one init-less case that start with a user init container and assert the expected final pod shape.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/controller/workflowpod_initless_test.go` around lines 180 - 322, Add
two tests mirroring TestCreateWorkflowPod_InitlessShape and
TestCreateWorkflowPod_LegacyShape but start the template with a user-defined
init container (e.g. add tmpl.InitContainers = []apiv1.Container{{Name:
"user-init", Image: "busybox"}} before calling woc.createWorkflowPod). In the
init-less variant (similar to TestCreateWorkflowPod_InitlessShape) assert that
pod.Spec.InitContainers contains the user init ("user-init") and does NOT
contain the injected init named common.InitContainerName, and still verify the
other init-less invariants (supervisor, argo-bin volume, ARGO_WAIT_FOR_READY on
main). In the legacy variant (similar to TestCreateWorkflowPod_LegacyShape)
assert that pod.Spec.InitContainers contains both the user init ("user-init")
and the injected common.InitContainerName, and that wait-container behavior and
absence of argo-bin volume remain as in the original legacy test. Use
createWorkflowPod, createWorkflowPodOpts, tmpl, and common.InitContainerName to
locate where to modify and which assertions to add.
workflow/util/util.go (1)

1639-1646: ⚡ Quick win

Prefer deterministic legacy-first lookup in FindAuxiliaryCtrIndex.

Current logic returns whichever appears first; making wait precedence explicit preserves legacy semantics and avoids ambiguous matches.

Suggested patch
 func FindAuxiliaryCtrIndex(pod *apiv1.Pod) (int, error) {
 	for i, ctr := range pod.Spec.Containers {
-		if ctr.Name == common.WaitContainerName || ctr.Name == common.SupervisorContainerName {
+		if ctr.Name == common.WaitContainerName {
 			return i, nil
 		}
 	}
+	for i, ctr := range pod.Spec.Containers {
+		if ctr.Name == common.SupervisorContainerName {
+			return i, nil
+		}
+	}
 	return -1, errors.Errorf("-1", "Could not find wait or supervisor container in pod spec")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workflow/util/util.go` around lines 1639 - 1646, FindAuxiliaryCtrIndex
currently returns whichever auxiliary container is encountered first; change it
to explicitly prefer common.WaitContainerName (legacy) over
common.SupervisorContainerName by first scanning pod.Spec.Containers for a
container with name == common.WaitContainerName and returning its index if
found, then scanning a second time for common.SupervisorContainerName and
returning that index if found; if neither is found, return -1 with a clear error
(use errors.Errorf or errors.New with a descriptive message) so the behavior is
deterministic and preserves legacy precedence.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.features/pending/initless-pod.md:
- Line 2: The author attribution line "Authors: [Alan
Clucas](https://github.com/Joibel)" mismatches name and profile; update that
single Markdown link so the display name and GitHub URL point to the same person
— either change the display text to "Joibel" to match the existing link or
change the URL to the correct Alan Clucas profile (e.g., github.com/AlanClucas)
so the "Authors" entry is unambiguous.

In `@cmd/argoexec/commands/emissary.go`:
- Around line 366-378: Don't recursively delete pre-existing artifact paths:
replace the os.RemoveAll(dst) call with os.Remove(dst) so we only remove
non-directory entries (matching legacy SubPath shadowing) and return an error if
removal fails (which will preserve real directories/volumes). Update the error
handling around the existing os.Lstat(dst) branch that references dst and
art.Name (in emissary.go) to call os.Remove(dst) and propagate any rmErr instead
of using RemoveAll.
- Around line 411-448: The two watcher goroutines currently swallow
context.Canceled by returning nil when file.WaitForCreate returns a
cancellation; change both handlers (the g.Go closures monitoring readyPath and
failedPath) to propagate the cancellation error instead of returning nil (i.e.,
return err when errors.Is(err, context.Canceled) is true or simply return err
directly), so that g.Wait() receives the parent cancellation (context.Canceled)
and the caller can exit promptly rather than treating cancellation as successful
readiness.

In `@workflow/common/common.go`:
- Around line 355-357: The problem is that user containers can be named "wait"
or "supervisor" and get misclassified as Argo sidecars; update
ContainerSetTemplate.Validate to reject reserved Argo aux container names by
checking container names against WaitContainerName and SupervisorContainerName
(and any artifact plugin sidecar names checked by IsArtifactPluginSidecar) and
return a clear validation error when a user-specified container matches one of
those reserved names; keep IsArgoSidecar as the runtime classifier but enforce
the reservation at ContainerSetTemplate.Validate (use the same symbols:
IsArgoSidecar, WaitContainerName, SupervisorContainerName,
IsArtifactPluginSidecar, and isValidWorkflowFieldName) so users cannot create
containers that will be misclassified.

In `@workflow/controller/operator_test.go`:
- Around line 7839-7849: Tighten TestInferFailedReason_SupervisorPreMainFailure
(and the similar test around lines 7882-7890) to assert that the losing main
container's signal/reason is absent: after the existing assert.Contains checks
for "supervisor" and "connection refused" (on the result of inferFailedReason),
add assert.NotContains assertions to ensure the message does NOT include the
main container's identifier or its failure text (e.g., "exit-65" / "exit code
65" or the main container name used in podWithSupervisorPreMainFailure); this
guarantees supervisor precedence rather than a merged/fallback message.

In `@workflow/controller/workflowpod.go`:
- Around line 1761-1808: The code appends input-only plugin sidecar containers
but doesn't add their corresponding Volume entries, causing pod admission to
fail; update the pod volume population so that volumes needed by init-less input
plugins are created: after calling woc.buildPluginSidecars (and using
woc.initlessPluginSidecarArtifactMounts), ensure you also generate and append
the corresponding apiv1.Volume objects (use the same logic as
createArtifactVolumeMounts or a helper that derives driver volumes from the
plugin drivers/driver.Name.VolumeMount()/driver.Name.Volume()) into
pod.Spec.Volumes before appending sidecars; reference buildPluginSidecars,
initlessPluginSidecarArtifactMounts, and createArtifactVolumeMounts to locate
where to add the volume creation and pod.Spec.Volumes = append(...) call.

---

Outside diff comments:
In `@workflow/controller/operator.go`:
- Around line 1813-1825: The fallback failure messages referencing "wait
container" are inaccurate in init-less mode; update the strings returned in the
branch handling mainContainerSucceeded and waitContainerSucceeded (the code
using mainContainerSucceeded, waitContainerSucceeded, and returning
wfv1.NodeFailed) to use init-less aware wording such as "wait/supervisor
container" or "supervisor (init-less) container" so they correctly describe
which container failed in both normal and init-less modes; keep the same return
values (wfv1.NodeFailed and the message) and only change the human-readable
text.

---

Nitpick comments:
In `@workflow/controller/workflowpod_initless_test.go`:
- Around line 83-90: The test currently only checks membership via the
inputNames map which won't catch duplicates; update the test around
inputPlugins/inputNames to assert the exact set/cardinality of inputPlugins
(e.g. assert len(inputPlugins) == 2 and that the set equals
{"shared-plugin","only-input"}) before the individual membership checks;
reference the inputPlugins variable (and the buildPluginSidecars output that
populates it) and then keep the existing assertions against inputNames for
clarity.
- Around line 180-322: Add two tests mirroring
TestCreateWorkflowPod_InitlessShape and TestCreateWorkflowPod_LegacyShape but
start the template with a user-defined init container (e.g. add
tmpl.InitContainers = []apiv1.Container{{Name: "user-init", Image: "busybox"}}
before calling woc.createWorkflowPod). In the init-less variant (similar to
TestCreateWorkflowPod_InitlessShape) assert that pod.Spec.InitContainers
contains the user init ("user-init") and does NOT contain the injected init
named common.InitContainerName, and still verify the other init-less invariants
(supervisor, argo-bin volume, ARGO_WAIT_FOR_READY on main). In the legacy
variant (similar to TestCreateWorkflowPod_LegacyShape) assert that
pod.Spec.InitContainers contains both the user init ("user-init") and the
injected common.InitContainerName, and that wait-container behavior and absence
of argo-bin volume remain as in the original legacy test. Use createWorkflowPod,
createWorkflowPodOpts, tmpl, and common.InitContainerName to locate where to
modify and which assertions to add.

In `@workflow/util/util.go`:
- Around line 1639-1646: FindAuxiliaryCtrIndex currently returns whichever
auxiliary container is encountered first; change it to explicitly prefer
common.WaitContainerName (legacy) over common.SupervisorContainerName by first
scanning pod.Spec.Containers for a container with name ==
common.WaitContainerName and returning its index if found, then scanning a
second time for common.SupervisorContainerName and returning that index if
found; if neither is found, return -1 with a clear error (use errors.Errorf or
errors.New with a descriptive message) so the behavior is deterministic and
preserves legacy precedence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f90c046-feb0-4072-a963-08138bb68d47

📥 Commits

Reviewing files that changed from the base of the PR and between 134224f and 482a250.

📒 Files selected for processing (43)
  • .features/pending/initless-pod.md
  • .github/workflows/ci-build.yaml
  • Makefile
  • cmd/argoexec/commands/artifact_plugin_init.go
  • cmd/argoexec/commands/artifact_plugin_sidecar.go
  • cmd/argoexec/commands/auxiliary.go
  • cmd/argoexec/commands/emissary.go
  • cmd/argoexec/commands/emissary_link_artifacts_test.go
  • cmd/argoexec/commands/emissary_read_template_test.go
  • cmd/argoexec/commands/emissary_ready_test.go
  • cmd/argoexec/commands/root.go
  • cmd/argoexec/commands/signal.go
  • cmd/argoexec/commands/supervisor.go
  • cmd/argoexec/commands/supervisor_test.go
  • cmd/argoexec/commands/wait.go
  • cmd/argoexec/executor/init.go
  • config/config.go
  • docs/architecture.md
  • docs/initless-pod.md
  • docs/tracing.md
  • docs/workflow-controller-configmap.md
  • hack/k8s-versions.sh
  • mkdocs.yml
  • test/e2e/manifests/components/initless-pod/kustomization.yaml
  • test/e2e/manifests/components/initless-pod/workflow-controller-configmap.yaml
  • test/e2e/manifests/minimal-initless/kustomization.yaml
  • test/e2e/manifests/plugins-initless/kustomization.yaml
  • util/file/watch.go
  • util/telemetry/builder/values.yaml
  • util/telemetry/traces_list.go
  • workflow/common/common.go
  • workflow/controller/artifact_gc.go
  • workflow/controller/config.go
  • workflow/controller/operator.go
  • workflow/controller/operator_test.go
  • workflow/controller/workflowpod.go
  • workflow/controller/workflowpod_initless_test.go
  • workflow/executor/emissary/emissary.go
  • workflow/executor/executor.go
  • workflow/executor/executor_test.go
  • workflow/executor/postmain.go
  • workflow/executor/postmain_test.go
  • workflow/util/util.go

Comment thread .features/pending/initless-pod.md
Comment thread cmd/argoexec/commands/emissary.go Outdated
Comment thread cmd/argoexec/commands/emissary.go Outdated
Comment thread workflow/common/common.go
Comment thread workflow/controller/operator_test.go
Comment thread workflow/controller/workflowpod.go
@Joibel Joibel force-pushed the initless-pod branch 6 times, most recently from c6b0e98 to 07af2cc Compare June 3, 2026 10:49
New controller-wide `initlessPod` mode that eliminates the `argoexec init`
container. The `argoexec` binary is delivered to `main` via a Kubernetes
image volume (KEP-4639 — Beta in K8s 1.33 behind a feature gate, GA in
1.36), and a new `supervisor` container subsumes the work of `init` plus
`wait`: template write, script staging, input artifact download,
readiness signaling, then the post-main responsibilities (observe main,
collect outputs/logs/artifacts) previously held by `wait`. Artifact
plugins run as a single sidecar per plugin invoked by `supervisor` for
both Load and Save, collapsing the legacy split between input-init
containers and output sidecars. Pods scheduled under this mode have
zero init containers.

Beta: off by default and may change in incompatible ways in future minor
releases before being promoted to stable. Legacy `wait` + init-container
behavior is unchanged when `initlessPod.enabled` is false (default).

Controller (workflow/controller)
- `createWorkflowPod` branches on `isInitlessPodEnabled()`: skips
  `pod.Spec.InitContainers` entirely, attaches `supervisor` in place of
  `wait`, mounts the argoexec-bin image volume on every main-level
  container, retargets the emissary at `/argo-bin/bin/argoexec`, sets
  `ARGO_WAIT_FOR_READY=true` on main-level containers, and passes
  `ARGO_TEMPLATE` directly to main for templates that don't run a
  supervisor (data / resource-without-logs).
- `buildPluginSidecars` / `addArtifactPluginsInitless` emit one sidecar
  per unique plugin across both Inputs.Artifacts and Outputs.Artifacts,
  with supervisor mounting each plugin's socket volume and receiving
  `ARGO_ARTIFACT_PLUGIN_NAMES` (all plugins, for Save) plus
  `ARGO_INPUT_ARTIFACT_PLUGIN_NAMES` (the Load subset).
- Input-artifact mount handling switches from per-artifact SubPath bind
  mounts (kubelet pre-creates SubPath entries as empty directories
  before supervisor can write to them) to a whole-volume mount at
  `/argo/inputs/artifacts` that the emissary symlinks into each
  artifact's expected path post-ready.
- `inferFailedReason` recognises supervisor as an error-reporting
  auxiliary container alongside `wait`.
- New `InitlessPodConfig` sub-struct in `config.Config`; reuses the
  existing `executor.image` / `executor.imagePullPolicy` for the image
  volume so the mounted binary always matches the running supervisor.

Argoexec (cmd/argoexec, workflow/executor)
- New `argoexec supervisor` subcommand: calls `WriteTemplate`, stages
  files, then loads non-plugin and per-plugin input artifacts in
  parallel via errgroup, writes `/var/run/argo/ready` (or `/failed`
  with the error text) atomically write-then-rename, and runs the
  shared post-main phase.
- Shared post-main logic extracted from `argoexec wait` into
  `WorkflowExecutor.PostMain`; the caller still owns tracing and the
  defer stack (errHandler stays outermost).
- Emissary `writeTemplate` split out from `Init`, exposed as a new
  `TemplateWriter` interface so the init-less supervisor can write the
  template without the binary-copy step. Legacy `Init` is unchanged.
- Emissary gains an opt-in pre-exec wait gated on
  `ARGO_WAIT_FOR_READY=true`: block on the ready/failed marker (via
  `file.WaitForCreate` race) before reading the template, fall back to
  `ARGO_TEMPLATE` when `/var/run/argo/template` is absent, then
  symlink each input artifact from `/argo/inputs/artifacts/<name>` to
  `art.Path`. A failed marker maps to exit code 65 so the controller
  can attribute pre-main failures to supervisor rather than the user
  command.
- `stageArchiveFile` falls back to the input-artifacts emptyDir when
  an output path overlaps an input artifact path — under init-less
  mode there is no mirrored `/mainctrfs/<art.Path>` from a SubPath
  bind mount.

Tracing
- New `RunSupervisorContainer` span as a sibling to `RunWaitContainer`
  under `CreateWorkflowPod`. `LoadArtifacts`, `StageFiles`,
  `SaveArtifacts`, `SaveLogs`, `CaptureScriptResult`,
  `CreateTaskResult`, `PatchTaskResult`, `PatchTaskResultLabels`, and
  `WaitWorkload` all accept it as a valid parent so sub-spans parent
  correctly in both modes.

CI, manifests, docs
- New `INITLESS=true|false` Make variable; when true `install` uses a
  `<profile>-initless` kustomize overlay layering a new `initless-pod`
  component on top of the profile's manifests.
- New `initless: true|false` CI matrix dimension restricted to
  `k8s_version: max`; re-runs `test-corefunctional`, `test-functional`,
  `test-artifacts`, and `test-plugins` under the new layout. Bumps max
  K8s version from 1.35 to 1.36 (image volumes are now GA there).
- New `docs/initless-pod.md` plus updates to `docs/architecture.md`,
  `docs/tracing.md`, `docs/workflow-controller-configmap.md`,
  `mkdocs.yml`, and a pending feature entry at
  `.features/pending/initless-pod.md`.

Tests
- 331-line `workflowpod_initless_test.go` covering pod-spec shape
  (zero init containers, supervisor present, image volume mounted,
  emissary entrypoint retargeted), plugin sidecar dedup across input
  and output references, template-env-var handling for templates
  without a supervisor, and legacy-mode invariance.
- New emissary unit tests for marker waiting (ready-appears,
  failed-appears, already-ready, ctx-cancelled), template-env
  fallback, and input-artifact symlink behaviour.
- New supervisor unit tests for ready/failed marker writes and
  `ARGO_INPUT_ARTIFACT_PLUGIN_NAMES` parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel marked this pull request as ready for review June 4, 2026 08:03
@Joibel Joibel requested review from a team as code owners June 4, 2026 08:03
@isubasinghe isubasinghe self-assigned this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove our init container (beta)

2 participants