You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Findings catalog — autonomous-readiness: plan→vote→implement→log→tune loop for arbitrary goals
From the 2026-05-31 full-codebase review (epic #3143). Domain health: adequate. This issue is the durable, individually-trackable list of findings for this domain; thematic work is tracked under epic #3143 (related phase: #3151).
Findings
[HIGH][architecture] Plan→Vote→Implement loop is solid; Tune phase decoupled from execution
Fix: Wire improvement_review MCP tool outputs (ImprovementSignal[]) directly into the pipeline task decomposition phase: detected signals (routing floor breaches, fitness drops, failure concentration) should auto-create PipelineTask objects and feed into the next cycle's decompose() stage, not just file GitHub issues. Currently improvement signals only produce issue URLs with no feedback loop to pipeline.
[HIGH][architecture] Outcome recording is CLI-specific but tuning hooks are missing for strategic routing changes
Fix: OutcomeStore records outcomes with family/vendor enrichment and queryByModelWithFamilyFallback() enables cold-start warm-start. But there's no mechanism to apply recorded learnings back to routing config (adapting budget constraints, thresholds, or CLI affinity based on observed performance). Add a 'routing tune' stage that reads weather_report, detects pathological patterns (e.g., 'gemini always timeouts on security category'), and emits routing-policy changes that are applied to the next orchestrate() invocation.
[MED][modularity] Belief memory (hindsight) is fire-and-forget with no integration to voting or plan refinement
Fix: dev-pipeline applies hindsight records to IHindsightBeliefMemory after execution (lines 325), but consensus_vote and planning stages do NOT consume this memory to inform voting weights or plan reasoning. Hindsight should flow backward: before the next vote, the architect should see 'prior plan approach X failed 3 times last week; consider Y instead'. Add an optional 'hindsight context' parameter to executeConsensusPlan() and plan() stages that retrieves relevant belief updates.
Evidence: packages/nexus-agents/src/pipeline/central-hub-vision.test.ts:14-19 (documents vision but not implemented); agent-executor.ts (research stage calls research_discover but output is not wired to plan prompts)
Fix: The research stage calls research_discover to populate the research context, but the plan() and vote() stages receive only raw research text—not the structured metadata (techniques_extracted, quality_signals, verdict_notes) that would help voters understand research confidence. Pass a ResearchContext object (not just string) through the pipeline containing technique tags, adoption status, and quality signals so voting can weight recommendations by research maturity.
[MED][mission-gap] No explicit feedback loop between vote rejection and improvement discovery
Evidence: packages/nexus-agents/src/mcp/tools/consensus-vote.ts (records vote outcomes); improvement-review.ts (detects fitness/routing signals); no consumer links the two
Fix: When consensus_vote rejects a plan, the rejection reason (DRY_VIOLATION, OVER_ENGINEERING, etc. per ADR 0016) should seed the next improvement_review cycle as domain-specific signals ('DRY violations are common for this task type'; 'OVER_ENGINEERING detected; simplify scope'). Currently rejection reasons are local to the proposal. Add a rejection-signal analyzer that feeds vote feedback into the observability layer.
Evidence: docs/v2/04-v2-architecture-pipeline-os.md:40-52; PolicyGateSpec type defined but no consumer in run_pipeline/run_graph_workflow that enforces learned policies
Fix: V2 architecture declares PolicyGateSpec between stages, but the actual policy-decision enforcement is structural (gates exist as node types) not learned (gates do not learn from outcomes or fitness signals). Implement adaptive gates: a policy-gate stage should read the OutcomeStore + FitnessAudit, apply a learned policy (e.g., 'if prior similar task failed on security, add security expert to decomposition'), and conditionally proceed or route to remediation. Gates should be data-driven.
[MED][modularity] Task routing (CompositeRouter) learns from outcomes but has no persistence or distributed sync
Evidence: packages/nexus-agents/src/orchestration/outcomes/outcome-store.ts:1-40 (in-memory, max 10k entries); getOutcomeStore() is process singleton (no distributed state); CLI-adapters routing uses computeQualityReward() on every call (O(N) scan per executeTask per orchestrate invocation)
Fix: OutcomeStore is in-memory and process-local. For autonomous multi-agent swarms or remote orchestrate() calls, routing decisions cannot be cached or shared. Add optional persistent outcome store backend (SQLite, Redis, append-only JSONL) with a cache layer in CompositeRouter so routing decisions do not thrash N recent outcomes per task. This is blocking distributed autonomy and scaling.
[MED][correctness] No bounded-iteration safeguard or cost-control loop back from execution to plan approval
Evidence: packages/nexus-agents/src/pipeline/dev-pipeline.ts:143-145 (MAX_VOTE_ITERATIONS=3, MAX_QA_ITERATIONS=3 hardcoded); pipeline-tool.ts:87 (dryRun stops after vote but cost/token tracking is not enforced)
Fix: Loops have max iterations (vote ≤3, QA ≤3) but no per-task cost accounting or global budget enforcement. If a task is estimated to cost $50 (buildDryRunReport) and actual execution is tracking at $200, the pipeline should interrupt and route to escalation. Add a cost-enforcement stage after each execute/validate that checks actual spend vs. plan estimate and decides proceed/refine/reject based on budget constraints from CompositeRouter.
Composability notes
The primitives (OutcomeStore, executeConsensusPlan, runDevPipeline, improvement_review, weather_report) are individually well-designed and modular. However, the COMPOSITION of these into a closed-loop autonomous cycle is incomplete. Specifically: (1) Improvement signals are produced (improvement_review surfaces issues) but not consumed by the pipeline to auto-generate next-cycle tasks. (2) Hindsight/belief memory flows one direction (outcomes → beliefs) but not backward (beliefs → voting). (3) Research outputs are string-only, not structured metadata, so research quality signals cannot inform voting weights. (4) Policy gates are architectural placeholders in V2 spec but not wired to actual learned policies from outcomes. (5) Routing learner (CompositeRouter) is ephemeral and cannot be distributed or persisted. To achieve true reusable building-block status, each primitive must declare its dependencies (e.g., executeConsensusPlan requires weather_report context, runDevPipeline optionally consumes improvement signals) and the pipeline orchestrator must wire these dependencies before execution. Currently each tool is callable standalone but their integration into a feedback loop is manual/implicit."
Mission gaps
Autonomous loop fails to close: improvement signals (bugs, routing failures, fitness drops) are detected but do NOT auto-create tasks for the next cycle. Signals are filed as GitHub issues (human-driven) but the system cannot independently self-improve by decomposing them.
Tuning phase is missing: outcomes are logged and aggregated (weather_report, fitness_score) but there is NO automatic adjustment of orchestration parameters (routing thresholds, budget constraints, policy gates, CLI affinity) in response to observed performance.
Distributed autonomy is blocked: OutcomeStore, composite router, and all learner state is in-memory and process-local. Swarms of remote agents cannot share routing decisions or outcome history.
Arbitrary goal scope narrowing: Pipeline is task-driven (task → plan → implement) but has no automated scope-tightening when plans are rejected or over-budget. Vote rejection feedback does not automatically trigger scope-analysis stage.
Part of epic #3143. Full review record: docs/archive/system-review-2026-05-31.md.
Findings catalog — autonomous-readiness: plan→vote→implement→log→tune loop for arbitrary goals
From the 2026-05-31 full-codebase review (epic #3143). Domain health:
adequate. This issue is the durable, individually-trackable list of findings for this domain; thematic work is tracked under epic #3143 (related phase: #3151).Findings
packages/nexus-agents/src/pipeline/dev-pipeline.ts:203-237; consensus-plan.ts:468-495improvement_reviewMCP tool outputs (ImprovementSignal[]) directly into the pipeline task decomposition phase: detected signals (routing floor breaches, fitness drops, failure concentration) should auto-create PipelineTask objects and feed into the next cycle's decompose() stage, not just file GitHub issues. Currently improvement signals only produce issue URLs with no feedback loop to pipeline.packages/nexus-agents/src/orchestration/outcomes/outcome-store.ts:85-102; packages/nexus-agents/src/cli-adapters/composite-router.ts (missing file)packages/nexus-agents/src/pipeline/dev-pipeline.ts:268-334_vendor/checkout path — chore: audit 2026-05-31 system-review issues for stale _vendor/ checkout paths #3695; the in-tree path is packages/nexus-agents/src/pipeline/.) (chore: reconcile the 12 system-review finding catalogs — tick resolved sub-findings / split residuals #3696 reconciliation 2026-06-09)packages/nexus-agents/src/pipeline/central-hub-vision.test.ts:14-19 (documents vision but not implemented); agent-executor.ts (research stage calls research_discover but output is not wired to plan prompts)packages/nexus-agents/src/mcp/tools/consensus-vote.ts (records vote outcomes); improvement-review.ts (detects fitness/routing signals); no consumer links the twodocs/v2/04-v2-architecture-pipeline-os.md:40-52; PolicyGateSpec type defined but no consumer in run_pipeline/run_graph_workflow that enforces learned policiespackages/nexus-agents/src/orchestration/outcomes/outcome-store.ts:1-40 (in-memory, max 10k entries); getOutcomeStore() is process singleton (no distributed state); CLI-adapters routing uses computeQualityReward() on every call (O(N) scan per executeTask per orchestrate invocation)packages/nexus-agents/src/pipeline/dev-pipeline.ts:143-145 (MAX_VOTE_ITERATIONS=3, MAX_QA_ITERATIONS=3 hardcoded); pipeline-tool.ts:87 (dryRun stops after vote but cost/token tracking is not enforced)Composability notes
The primitives (OutcomeStore, executeConsensusPlan, runDevPipeline, improvement_review, weather_report) are individually well-designed and modular. However, the COMPOSITION of these into a closed-loop autonomous cycle is incomplete. Specifically: (1) Improvement signals are produced (improvement_review surfaces issues) but not consumed by the pipeline to auto-generate next-cycle tasks. (2) Hindsight/belief memory flows one direction (outcomes → beliefs) but not backward (beliefs → voting). (3) Research outputs are string-only, not structured metadata, so research quality signals cannot inform voting weights. (4) Policy gates are architectural placeholders in V2 spec but not wired to actual learned policies from outcomes. (5) Routing learner (CompositeRouter) is ephemeral and cannot be distributed or persisted. To achieve true reusable building-block status, each primitive must declare its dependencies (e.g., executeConsensusPlan requires weather_report context, runDevPipeline optionally consumes improvement signals) and the pipeline orchestrator must wire these dependencies before execution. Currently each tool is callable standalone but their integration into a feedback loop is manual/implicit."
Mission gaps
Part of epic #3143. Full review record:
docs/archive/system-review-2026-05-31.md.