fix(cli): preserve soundwave-authored slug across handoff to decepticon#180
fix(cli): preserve soundwave-authored slug across handoff to decepticon#180Lempkey wants to merge 2 commits intoPurpleAILAB:mainfrom
Conversation
On the engagement_ready event, the CLI captured the freshly-authored engagement slug into pendingHandoffRef but then discarded it before the next Decepticon submit could use it. The submit fell back to the DECEPTICON_ENGAGEMENT env var, which is stale or empty for engagements created during the Soundwave interview. Fix: add queuedEngagementSlugRef (set on engagement_ready, cleared in handleStreamComplete) and extract buildSubmitInput() as a pure helper that applies slug precedence — handoff slug wins over env var. The helper is independently unit-tested. Closes PurpleAILAB#159
…e teardown handleStreamComplete() was nulling queuedEngagementSlugRef during the Soundwave stream's teardown — before the auto-submitted Decepticon run could read it. The slug was always null by the time submit() reached buildSubmitInput(), so it fell back to the stale DECEPTICON_ENGAGEMENT env var. Fix: capture the slug into a local before the stream starts, and clear the ref only after processStream() resolves successfully. On failure the ref is left intact, preserving the slug for a safe retry.
|
Thanks for the thorough review. The high-severity finding was correct — `queuedEngagementSlugRef` was being nulled inside `handleStreamComplete()` (during the Soundwave teardown), which ran before the auto-submitted Decepticon `submit()` could consume it. The ref arrived as `null` at `buildSubmitInput`, so it always fell back to the stale env var. Fix (7247f00): Two changes in `useAgent.ts`:
```typescript try { On the lifecycle test gap: Agreed the existing tests don't cover the `engagement_ready → handleStreamComplete → next submit` sequence. Adding those requires `@testing-library/react` + jsdom (the current vitest config uses `environment: "node"`), which is out of scope for this fix. Happy to follow up in a separate PR if the maintainers want to expand the test infrastructure. |
|
After re-examining the root cause of this regression (#159), I concluded that the architecture itself — letting the LLM be the source of truth for the engagement slug — is the actual problem. Replacing this PR with #182 ( Why the redirectThe natural product flow:
Under this flow the slug is launcher-owned. The slug Soundwave's LLM emits via What #182 does instead
What survivesYour regression tests ( Closing this PR. |
…able (#182) * refactor(engagement): launcher-authoritative slug via config.configurable Move engagement_name and workspace_path from input state to runnable config.configurable. The launcher (and web terminal-server env) is the single source of truth; the LLM no longer decides the engagement slug. Backend: - complete_engagement_planning: drop engagement_name arg; pure boolean handoff signal. The emitted engagement_ready event carries no slug. - EngagementContextMiddleware: add before_agent hydration that copies engagement_name + workspace_path from configurable into state on the first agent step. Idempotent — state wins when present. Downstream middleware (OPPLAN, filesystem) read state as before. - filesystem._normalize_engagement_workspace: strict 4-case contract (empty -> None, /workspace accepted as engagement root for launcher mode, /workspace/<slug> normalized, traversal/invalid -> None, no silent coercion). Frontend: - useAgent.ts: send engagement_name/workspace_path via config.configurable on every run; pendingHandoffRef downgraded to boolean. - types.ts: drop engagement? field from SubagentCustomEvent. - soundwave.md: prompt no longer instructs the LLM to pass a slug. Tests: - test_engagement.py: +6 tests covering hydration paths and workspace resolution precedence. - test_filesystem.py: +5 tests for the new contract; flips the old test_root_workspace_fails_closed to test_root_workspace_accepted_as _engagement_root. Closes #152, #159, #173. Addresses the filesystem failure in #175. Supersedes PR #180 (slug queuing) and PR #181 (filesystem normalize). * style(scripts): ruff format render_benchmark_charts.py Pre-existing format drift on main; CI's repo-wide ruff format --check was failing on this file unrelated to the refactor. Whitespace-only. * fix(subagent_streaming): guard None tool_call_id from corrupting active dict LangChain's ToolCall.id is `str | None` per the spec. The previous code keyed `active_tool_calls[tc["id"]] = tc` directly, so multiple id-less tool calls in the same iteration would collide under the same `None` key — the second overwrite wins and the subsequent ToolMessage lookup returns the wrong tool's name and args. Tool-result events were not dropped (they were emitted regardless), but they surfaced under the wrong tool identity. Changes: - Skip the active_tool_calls write when tc.get("id") is None and emit a warning so we can spot subagent/provider configurations that drop the id. Subsequent ToolMessage lookups for those calls fall through to the "unknown" branch instead of mis-attributing. - Tighten the active_tool_calls type from dict[str, dict] to dict[str, ToolCall] (the actual TypedDict) — both signature and call-site annotations. - Add three regression tests: None-id storage skipped + warning emitted, ToolMessage with tc-less id falls through to "unknown", and a valid id keeps its mapping when a concurrent None-id call is present. Supersedes the corresponding fix in PR #178 (which closed without landing) — the diagnosis "results dropped from the streaming pipeline" was inaccurate; the practical effect is wrong-name attribution. The type annotation also lands as `dict[str, ToolCall]` rather than the laxer `dict[str, Any]` from that PR.
Summary
queuedEngagementSlugRefto capture the slug emitted by Soundwave'scomplete_engagement_planningtool via theengagement_readycustom event (useAgent.ts:303)buildSubmitInput()as a pure helper that applies slug precedence: the queued handoff slug wins overDECEPTICON_ENGAGEMENTenv var, which may be stale or empty for engagements created during the Soundwave interviewhandleStreamComplete(alongside the existingpendingHandoffRefclear) so it is not re-injected on subsequent Decepticon submits on the same threadpendingHandoffRefboolean semantics untouched; the handoff still fires even when theengagement_readyevent carries no slugRoot cause
pendingHandoffRefcaptured the slug at L303 but the slug string was discarded at L535 (pendingHandoffRef.current = null) before the next Decepticon submit at L687 could use it. The submit fell back toprocess.env.DECEPTICON_ENGAGEMENT, which is the launcher-set value and is stale when the operator created a new engagement during the interview.Test plan
npm run typecheck— TypeScript cleannpm test— 8 new tests inbuildSubmitInput.test.tspass, 24 total passmake dev+make cli), complete planning, verify the first Decepticon message includes the correctengagement_name(requires full stack — impractical for CI)Note on downstream failures
Filesystem tool failures (#152, #173, #175) are downstream symptoms that still require a follow-up PR fixing
_normalize_engagement_workspaceatdecepticon/middleware/filesystem.py:32— that function currently rejects/workspaceas "no engagement workspace" and will continue to do so until repaired. This PR fixes the slug-drop at the handoff point; the middleware fix unblocks the filesystem tools.Closes #159