Skip to content

feat: add autonomous PRD-TDD-review workflow extension#5

Open
ponderingdemocritus wants to merge 1 commit intomainfrom
ponderingdemocritus/npm-fork
Open

feat: add autonomous PRD-TDD-review workflow extension#5
ponderingdemocritus wants to merge 1 commit intomainfrom
ponderingdemocritus/npm-fork

Conversation

@ponderingdemocritus
Copy link
Copy Markdown

@ponderingdemocritus ponderingdemocritus commented Mar 2, 2026

This PR adds a new project-local autonomous-loop extension that enforces a PRD -> TDD -> review -> fix -> ready_to_push workflow with phase gates and persisted state. It includes parser/state/review modules, hook wiring, and a /workflow command surface for status, transitions, findings, reset, and PRD loading. It also adds targeted tests for workflow state transitions, PRD parsing, review finding lifecycle, and repo slug parsing, plus a README smoke test script. Documentation was updated in the repo README and a full TDD PRD board was added for operating the loop.

Summary by CodeRabbit

Release Notes

  • New Features

    • Autonomous Loop Extension for PRD-driven development with enforced workflow phases
    • Workflow commands for state management: status, findings, reset, advance, and load PRDs
    • Automated PRD parsing and task extraction with acceptance criteria support
    • Review and findings system with severity-based blocking and fix tracking
    • Git push gating based on workflow completion state
  • Tests

    • Added test suites for PRD parsing, review workflows, and state management
  • Documentation

    • Autonomous Loop documentation with phase sequence and command reference
    • Workflow board specification for development loop

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive autonomous loop extension enabling hook-enforced PRD-driven development workflows with state management, phase transitions, PRD parsing, review/fix protocols, and repository slug parsing. The system manages workflow progression from PRD loading through TDD red/green/refactor cycles to review and push gates.

Changes

Cohort / File(s) Summary
Workflow Extension Core
.pi/extensions/autonomous-loop/index.ts, .pi/extensions/autonomous-loop/hooks.ts, .pi/extensions/autonomous-loop/README.md
Implements extension entry point with command registration (status, advance, reset, findings, load), comprehensive event hook handlers (session_start, before_agent_start, tool_call, tool_result, turn_end, agent_end, session_shutdown) for state management, phase transitions, PRD auto-loading/generation, and UI notifications via WorkflowController API.
State Machine & Phase Management
.pi/extensions/autonomous-loop/workflow-state.ts
Defines WorkflowState, WorkflowPhase union, TaskEvidence structures, and state lifecycle functions (createInitialWorkflowState, transitionPhase, canTransition, loadPrdTasks, recordTaskEvidence, markTaskCompletedIfReady, canPush) to validate and persist multi-phase autonomous workflow execution.
Data Parsing & Validation
.pi/extensions/autonomous-loop/prd-parser.ts, .pi/extensions/autonomous-loop/review-engine.ts, .pi/extensions/autonomous-loop/repo-slug.ts
Provides PRD Markdown parser (parsePrdMarkdown) extracting tasks with acceptance criteria and evidence requirements; review engine parsing and managing findings with severity/blocking logic; repository slug parser supporting GitHub, GitLab, Bitbucket URLs and shorthand formats.
Documentation & Design Specification
README.md, TDD-PRD-BOARD.md, .pi/extensions/autonomous-loop/README.md
Adds Autonomous Loop documentation detailing phase sequence, workflow commands, Single-Turn Autonomy behavior, review/fix protocol, and comprehensive design board with MVP scope, state machine definition, hook policy matrix, and operator commands.
Test Suite
packages/coding-agent/test/autonomous-loop/*, test-readme-autonomous-loop.sh
Includes unit tests for PRD parser (task/criteria extraction, error handling), review engine (finding/fix task lifecycle), workflow state (phase transitions, task completion logic), repository slug parsing (multi-format URL validation), and integration test validating README documentation completeness.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Agent
    participant PI as Extension<br/>(index.ts)
    participant Hook as Hooks<br/>(hooks.ts)
    participant State as State<br/>(workflow-state.ts)
    participant Parser as PRD Parser<br/>(prd-parser.ts)
    participant UI as UI Notification

    User->>PI: /workflow load <path>
    PI->>Hook: workflow.loadPrdFromPath(path)
    Hook->>Parser: parsePrdMarkdown(content)
    Parser->>State: tasks parsed
    Hook->>State: loadPrdTasks()
    State->>State: transition prd_loaded
    Hook->>UI: notify "PRD loaded"
    
    Note over User,UI: When agent starts, auto-advance from prd_loaded to tdd_red
    User->>Hook: before_agent_start event
    Hook->>State: transitionPhase(tdd_red)
    Hook->>UI: provide phase instruction
    
    Note over User,UI: Agent runs TDD cycle with guarded mutations
    User->>Hook: tool_call event (test command)
    Hook->>State: gate check (phase tdd_red allows tests)
    Hook->>User: permit or block based on phase
    
    User->>Hook: tool_result event (test result)
    Hook->>State: recordTaskEvidence("red"|"green")
    State->>State: evaluate task completion
    Hook->>State: canTransition to tdd_refactor?
    Hook->>UI: notify "ready to refactor"
Loading
sequenceDiagram
    participant Agent as Agent
    participant Hook as Hooks<br/>(hooks.ts)
    participant Engine as Review<br/>Engine
    participant State as State<br/>(workflow-state.ts)
    participant UI as UI

    Agent->>Hook: turn_end event (review_1 phase)
    Hook->>Engine: parseFindingsFromText(conclusion)
    Engine->>Engine: extract FINDING lines<br/>with severity/blockers
    Engine->>State: setFindingsAndFixTasks()
    Hook->>State: transition fix_findings
    Hook->>UI: notify findings parsed
    
    Note over Agent,UI: Fix cycle: agent addresses findings
    Agent->>Hook: turn_end event (fix_findings phase)
    Hook->>Engine: parseFixedMarkers(output)
    Engine->>State: applyFixes(findings, fixTasks)
    State->>State: mark findings resolved
    Hook->>State: transitionPhase(review_2)
    
    Note over Agent,UI: Final review validation
    Hook->>State: allTasksCompleted() +<br/>no blocking findings?
    alt All requirements met
        Hook->>State: transitionPhase(ready_to_push)
        Hook->>UI: "ready for git push"
    else Incomplete
        Hook->>UI: "gate violation: push blocked"
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A loop most autonomous takes flight,
PRD blooms, then tests run red to green so bright,
Review and findings dance, then refactored clean,
Phases marching forward—the best workflow seen!
With gates and evidence, no push without care, 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a new autonomous PRD-TDD-review workflow extension. It is concise, clear, and directly related to the primary purpose of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ponderingdemocritus/npm-fork

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
test-readme-autonomous-loop.sh (1)

7-28: Consider using consistent grep flags and improving robustness.

The pattern on line 19 uses extended regex (\|) which requires -E flag on some systems for portability. Also, the script could benefit from set -e for early failure.

♻️ Suggested improvements
 #!/usr/bin/env bash
+set -euo pipefail
 # Test: README must contain Autonomous Loop documentation
 
-README="/Users/os/conductor/workspaces/pi-mono-v4/hangzhou-v1/README.md"
+README="README.md"
+
+if [[ ! -f "$README" ]]; then
+    echo "FAIL: README.md not found"
+    exit 1
+fi
 
 # Check for Autonomous Loop section heading
 if ! grep -q "## Autonomous Loop" "$README"; then
@@ -16,7 +20,7 @@
 fi
 
 # Check for phase states
-if ! grep -q "prd_loaded\|tdd_red\|tdd_green\|review_1" "$README"; then
+if ! grep -Eq "prd_loaded|tdd_red|tdd_green|review_1" "$README"; then
     echo "FAIL: README missing phase state documentation"
     exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-readme-autonomous-loop.sh` around lines 7 - 28, The grep invocations in
test-readme-autonomous-loop.sh use a non-portable alternation ("\|") and lack
strict failure handling; update the script to enable early exit (add set -e near
the top) and replace occurrences of grep -q
"prd_loaded\|tdd_red\|tdd_green\|review_1" "$README" with a portable
extended-regex call (e.g., use grep -qE or grep -E -q) or multiple grep checks;
similarly ensure all grep calls consistently use appropriate flags and quoted
variables (README) to improve robustness and portability.
README.md (1)

46-79: Consider adding language specifiers to fenced code blocks.

The static analysis tool flagged missing language specifiers on lines 46, 67, 72, and 77. For documentation consistency, consider using text for plain text blocks:

📝 Suggested fix
-```
+```text
 PRD -> TDD (red/green/refactor) -> review_1 -> fix_findings -> review_2 -> ready_to_push -> done
-```
+```

Similarly for the protocol format blocks (lines 67, 72, 77).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 46 - 79, The fenced code blocks like the workflow
diagram string "PRD -> TDD (red/green/refactor) -> review_1 -> fix_findings ->
review_2 -> ready_to_push -> done" and the protocol examples
"FINDING|severity|file1,file2|summary", "FIXED|finding-id|evidence", and
"REVIEW2|PASS" are missing language specifiers; update each triple-backtick
block to use ```text (e.g., change ``` to ```text) so the README's fenced code
blocks explicitly declare the language for the diagram and protocol format
examples.
.pi/extensions/autonomous-loop/workflow-state.ts (2)

241-265: updateFindingsAndFixTasks appears to be a duplicate of setFindingsAndFixTasks.

Both functions have identical implementations. Consider removing the duplicate or clarifying if there's intended semantic difference (e.g., one for replacement vs. one for updates).

♻️ Suggested fix - remove duplicate
-export function updateFindingsAndFixTasks(
-	state: WorkflowState,
-	findings: ReviewFinding[],
-	fixTasks: FixTask[],
-): WorkflowState {
-	return {
-		...state,
-		findings,
-		fixTasks,
-		lastUpdatedAt: now(),
-	};
-}

If both are needed for semantic clarity, consider having one delegate to the other:

export const updateFindingsAndFixTasks = setFindingsAndFixTasks;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/workflow-state.ts around lines 241 - 265, The
functions setFindingsAndFixTasks and updateFindingsAndFixTasks are identical;
remove the duplicate by either deleting updateFindingsAndFixTasks or make it
explicitly delegate to setFindingsAndFixTasks to preserve any semantic naming.
Update references to updateFindingsAndFixTasks elsewhere if you remove it, or if
you keep it as a wrapper, implement delegation so only setFindingsAndFixTasks
contains the implementation and updateFindingsAndFixTasks simply forwards to it.

267-279: Minor: now() called twice in addGateViolation.

The timestamp in the violation object and lastUpdatedAt could differ by milliseconds. Consider capturing the timestamp once.

📝 Suggested improvement
 export function addGateViolation(state: WorkflowState, reason: string): WorkflowState {
+	const timestamp = now();
 	const violation: WorkflowGateViolation = {
-		id: `gate-${state.gateViolations.length + 1}-${now()}`,
+		id: `gate-${state.gateViolations.length + 1}-${timestamp}`,
 		phase: state.phase,
 		reason,
-		timestamp: now(),
+		timestamp,
 	};
 	return {
 		...state,
 		gateViolations: [...state.gateViolations, violation],
-		lastUpdatedAt: now(),
+		lastUpdatedAt: timestamp,
 	};
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/workflow-state.ts around lines 267 - 279, In
addGateViolation, avoid calling now() twice so the violation timestamp and
lastUpdatedAt are consistent: capture a single const ts = now() at the top of
the function and use ts when building the id (`gate-${...}-${ts}`), setting
violation.timestamp, and assigning lastUpdatedAt; update references to
state.gateViolations and state.phase remain the same (function name
addGateViolation, type WorkflowGateViolation) so the rest of the logic is
unchanged.
TDD-PRD-BOARD.md (1)

83-92: Add blank line before the table for markdown compliance.

The table at line 84 should be preceded by a blank line per markdown best practices (MD058).

📝 Proposed fix
 ## Hook Policy Matrix
+
 | Hook | Responsibility | Hard Fail Condition |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TDD-PRD-BOARD.md` around lines 83 - 92, The markdown header "## Hook Policy
Matrix" is immediately followed by a table causing MD058; insert a single blank
line between the "## Hook Policy Matrix" heading and the table (the
pipe-delimited block starting with "| Hook | Responsibility | Hard Fail
Condition |") so the table is preceded by an empty line for Markdown compliance.
.pi/extensions/autonomous-loop/hooks.ts (1)

536-538: Silent catch block swallows transition errors.

The empty catch block silently ignores transition failures. While the comment explains the intent, this could mask unexpected errors. Consider logging at debug level.

📝 Suggested improvement
 			} catch {
-				// Keep current phase if transition is invalid.
+				// Keep current phase if transition is invalid.
+				// Intentionally silent - invalid transitions are expected when PRD has no tasks.
 			}

Or if a logging mechanism is available:

 			} catch (error) {
+				// Expected when transition is invalid; log for debugging.
+				console.debug?.("Transition to tdd_red skipped:", error);
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/hooks.ts around lines 536 - 538, The empty
catch in hooks.ts that currently swallows transition errors should be changed to
capture the thrown error (e.g., catch (err)) and emit a debug-level log while
preserving the current behavior of keeping the phase on invalid transitions;
replace the silent block with a call to the existing logger (or console.debug if
no logger exists) including context like "phase transition failed" and the error
object so unexpected issues are visible for debugging.
packages/coding-agent/test/autonomous-loop/workflow-state.test.ts (1)

12-26: Consider expanding transition matrix test coverage.

The test validates the main happy path transitions but misses some allowed transitions from TRANSITION_MAP:

  • review_2 -> fix_findings (regression path)
  • ready_to_push -> fix_findings (regression path)
  • tdd_refactor -> tdd_red (next task cycle)

These regression/loop paths are important for the workflow's flexibility.

🧪 Additional test cases to consider
 		expect(canTransition("ready_to_push", "done")).toBe(true);

+		// Regression/loop transitions
+		expect(canTransition("tdd_refactor", "tdd_red")).toBe(true);
+		expect(canTransition("review_2", "fix_findings")).toBe(true);
+		expect(canTransition("ready_to_push", "fix_findings")).toBe(true);
+
 		expect(canTransition("idle", "tdd_red")).toBe(false);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/coding-agent/test/autonomous-loop/workflow-state.test.ts` around
lines 12 - 26, The test for phase transitions is missing several allowed
regression/loop paths defined in TRANSITION_MAP; update the test in
packages/coding-agent/test/autonomous-loop/workflow-state.test.ts to include
assertions that canTransition returns true for the additional allowed
transitions: review_2 -> fix_findings, ready_to_push -> fix_findings, and
tdd_refactor -> tdd_red, using the existing canTransition helper so the test
fully covers the TRANSITION_MAP allowed paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pi/extensions/autonomous-loop/hooks.ts:
- Around line 377-378: The Map bashCommandsByCallId is prone to unbounded growth
because entries are only removed in the tool_result path; update the handlers to
avoid leaks by (1) removing the entry when a tool_call returns { block: true }
(or any early-return/exception path) and (2) adding a bounded-eviction policy
such as a maximum size or time-to-live cleanup for bashCommandsByCallId (e.g.,
implement LRU eviction or a periodic setInterval sweep that removes entries
older than N milliseconds). Locate usages by the symbol bashCommandsByCallId and
the event handlers for tool_call and tool_result to add the immediate removals
and the global eviction/cleanup logic.

In @.pi/extensions/autonomous-loop/review-engine.ts:
- Around line 52-58: The current ID helpers findingId and fixTaskId generate IDs
using per-call indices which reset across parsing passes and can cause
collisions; update these functions (findingId and fixTaskId) to incorporate a
stable unique component (e.g., a UUID, a run/session token, or Date.now()
timestamp) or accept a base offset parameter so IDs remain globally unique
across multiple calls; ensure the chosen unique value is generated once per
review/session and propagated into
parseFindingsFromText/createFixTasksFromFindings so each generated ID becomes
finding-<index>-<sessionOrUuid>-<escapedSummary> and
fix-<index>-<sessionOrUuid>-<escapedFindingId> (or equivalent) to avoid
collisions.

In @.pi/extensions/autonomous-loop/workflow-state.ts:
- Around line 290-311: sanitizePersistedState currently only checks that tasks,
findings, fixTasks, and gateViolations are arrays but not the shapes of their
elements, which lets malformed persisted items cause crashes when callers (e.g.,
unresolvedBlockingFindings in review-engine.ts or hooks.ts where code accesses
finding.fileRefs.length) assume properties exist; update sanitizePersistedState
to validate element shapes (ensure each task/finding/fixTask/gateViolation is an
object and has the minimal required fields and types you expect—e.g., findings
have id and fileRefs as an array, tasks have id/status, fixTasks have
id/status—and either coerce missing fields to safe defaults or filter out
invalid entries) and return a sanitized array, or alternatively ensure all
strong call sites also guard (wrap accesses like finding.fileRefs.length with
existence checks), referencing sanitizePersistedState,
unresolvedBlockingFindings, and the hooks usage to guide where to apply the
guards.

In `@test-readme-autonomous-loop.sh`:
- Line 4: The README variable is set to a hardcoded absolute path which breaks
portability; update the assignment of README in test-readme-autonomous-loop.sh
to compute the path dynamically (either relative to the script location using
the script's directory resolution logic such as dirname "$0" or relative to the
repository root determined via git rev-parse --show-toplevel) so the script
works on other machines/CI; locate the README variable assignment and replace
the literal path with a computed path using those methods and keep the rest of
the script unchanged.

---

Nitpick comments:
In @.pi/extensions/autonomous-loop/hooks.ts:
- Around line 536-538: The empty catch in hooks.ts that currently swallows
transition errors should be changed to capture the thrown error (e.g., catch
(err)) and emit a debug-level log while preserving the current behavior of
keeping the phase on invalid transitions; replace the silent block with a call
to the existing logger (or console.debug if no logger exists) including context
like "phase transition failed" and the error object so unexpected issues are
visible for debugging.

In @.pi/extensions/autonomous-loop/workflow-state.ts:
- Around line 241-265: The functions setFindingsAndFixTasks and
updateFindingsAndFixTasks are identical; remove the duplicate by either deleting
updateFindingsAndFixTasks or make it explicitly delegate to
setFindingsAndFixTasks to preserve any semantic naming. Update references to
updateFindingsAndFixTasks elsewhere if you remove it, or if you keep it as a
wrapper, implement delegation so only setFindingsAndFixTasks contains the
implementation and updateFindingsAndFixTasks simply forwards to it.
- Around line 267-279: In addGateViolation, avoid calling now() twice so the
violation timestamp and lastUpdatedAt are consistent: capture a single const ts
= now() at the top of the function and use ts when building the id
(`gate-${...}-${ts}`), setting violation.timestamp, and assigning lastUpdatedAt;
update references to state.gateViolations and state.phase remain the same
(function name addGateViolation, type WorkflowGateViolation) so the rest of the
logic is unchanged.

In `@packages/coding-agent/test/autonomous-loop/workflow-state.test.ts`:
- Around line 12-26: The test for phase transitions is missing several allowed
regression/loop paths defined in TRANSITION_MAP; update the test in
packages/coding-agent/test/autonomous-loop/workflow-state.test.ts to include
assertions that canTransition returns true for the additional allowed
transitions: review_2 -> fix_findings, ready_to_push -> fix_findings, and
tdd_refactor -> tdd_red, using the existing canTransition helper so the test
fully covers the TRANSITION_MAP allowed paths.

In `@README.md`:
- Around line 46-79: The fenced code blocks like the workflow diagram string
"PRD -> TDD (red/green/refactor) -> review_1 -> fix_findings -> review_2 ->
ready_to_push -> done" and the protocol examples
"FINDING|severity|file1,file2|summary", "FIXED|finding-id|evidence", and
"REVIEW2|PASS" are missing language specifiers; update each triple-backtick
block to use ```text (e.g., change ``` to ```text) so the README's fenced code
blocks explicitly declare the language for the diagram and protocol format
examples.

In `@TDD-PRD-BOARD.md`:
- Around line 83-92: The markdown header "## Hook Policy Matrix" is immediately
followed by a table causing MD058; insert a single blank line between the "##
Hook Policy Matrix" heading and the table (the pipe-delimited block starting
with "| Hook | Responsibility | Hard Fail Condition |") so the table is preceded
by an empty line for Markdown compliance.

In `@test-readme-autonomous-loop.sh`:
- Around line 7-28: The grep invocations in test-readme-autonomous-loop.sh use a
non-portable alternation ("\|") and lack strict failure handling; update the
script to enable early exit (add set -e near the top) and replace occurrences of
grep -q "prd_loaded\|tdd_red\|tdd_green\|review_1" "$README" with a portable
extended-regex call (e.g., use grep -qE or grep -E -q) or multiple grep checks;
similarly ensure all grep calls consistently use appropriate flags and quoted
variables (README) to improve robustness and portability.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16556c9 and 1915bd4.

📒 Files selected for processing (14)
  • .pi/extensions/autonomous-loop/README.md
  • .pi/extensions/autonomous-loop/hooks.ts
  • .pi/extensions/autonomous-loop/index.ts
  • .pi/extensions/autonomous-loop/prd-parser.ts
  • .pi/extensions/autonomous-loop/repo-slug.test.ts
  • .pi/extensions/autonomous-loop/repo-slug.ts
  • .pi/extensions/autonomous-loop/review-engine.ts
  • .pi/extensions/autonomous-loop/workflow-state.ts
  • README.md
  • TDD-PRD-BOARD.md
  • packages/coding-agent/test/autonomous-loop/prd-parser.test.ts
  • packages/coding-agent/test/autonomous-loop/review-engine.test.ts
  • packages/coding-agent/test/autonomous-loop/workflow-state.test.ts
  • test-readme-autonomous-loop.sh

Comment on lines +377 to +378
const bashCommandsByCallId = new Map<string, string>();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential memory leak in bashCommandsByCallId map.

The map stores commands on tool_call events but only deletes them in tool_result handlers. If a tool call is blocked (returns { block: true }) or the tool_result event is never fired for some reason, entries will accumulate.

Consider adding periodic cleanup or a maximum size limit.

💡 Suggested mitigation
 	const bashCommandsByCallId = new Map<string, string>();
+	const MAX_PENDING_COMMANDS = 100;

 	// In tool_call handler, before adding:
+	if (bashCommandsByCallId.size >= MAX_PENDING_COMMANDS) {
+		// Clear oldest entries or log warning
+		const firstKey = bashCommandsByCallId.keys().next().value;
+		if (firstKey) bashCommandsByCallId.delete(firstKey);
+	}
 	bashCommandsByCallId.set(event.toolCallId, command);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/hooks.ts around lines 377 - 378, The Map
bashCommandsByCallId is prone to unbounded growth because entries are only
removed in the tool_result path; update the handlers to avoid leaks by (1)
removing the entry when a tool_call returns { block: true } (or any
early-return/exception path) and (2) adding a bounded-eviction policy such as a
maximum size or time-to-live cleanup for bashCommandsByCallId (e.g., implement
LRU eviction or a periodic setInterval sweep that removes entries older than N
milliseconds). Locate usages by the symbol bashCommandsByCallId and the event
handlers for tool_call and tool_result to add the immediate removals and the
global eviction/cleanup logic.

Comment on lines +52 to +58
function findingId(index: number, summary: string): string {
return `finding-${index + 1}-${escapeId(summary) || "item"}`;
}

function fixTaskId(index: number, finding: ReviewFinding): string {
return `fix-${index + 1}-${escapeId(finding.id)}`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ID generation may produce collisions across multiple parsing passes.

The findingId and fixTaskId functions use array indices (findings.length, fixTasks.length) which restart at 0 for each call to parseFindingsFromText or createFixTasksFromFindings. If these functions are called multiple times (e.g., in different review passes), IDs like finding-1-... could collide.

Consider using a UUID or incorporating a timestamp/session identifier to ensure uniqueness across passes.

💡 Suggested improvement
+let findingCounter = 0;
+let fixTaskCounter = 0;
+
 function findingId(index: number, summary: string): string {
-	return `finding-${index + 1}-${escapeId(summary) || "item"}`;
+	return `finding-${++findingCounter}-${escapeId(summary) || "item"}`;
 }

 function fixTaskId(index: number, finding: ReviewFinding): string {
-	return `fix-${index + 1}-${escapeId(finding.id)}`;
+	return `fix-${++fixTaskCounter}-${escapeId(finding.id)}`;
 }

Alternatively, pass a base offset or use Date.now() as part of the ID.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/review-engine.ts around lines 52 - 58, The
current ID helpers findingId and fixTaskId generate IDs using per-call indices
which reset across parsing passes and can cause collisions; update these
functions (findingId and fixTaskId) to incorporate a stable unique component
(e.g., a UUID, a run/session token, or Date.now() timestamp) or accept a base
offset parameter so IDs remain globally unique across multiple calls; ensure the
chosen unique value is generated once per review/session and propagated into
parseFindingsFromText/createFixTasksFromFindings so each generated ID becomes
finding-<index>-<sessionOrUuid>-<escapedSummary> and
fix-<index>-<sessionOrUuid>-<escapedFindingId> (or equivalent) to avoid
collisions.

Comment on lines +290 to +311
export function sanitizePersistedState(raw: unknown): WorkflowState | undefined {
if (!raw || typeof raw !== "object") return undefined;
const candidate = raw as Partial<WorkflowState>;
if (candidate.version !== 1) return undefined;
if (!candidate.phase || !PHASE_SEQUENCE.includes(candidate.phase)) return undefined;
if (!Array.isArray(candidate.tasks)) return undefined;
if (!Array.isArray(candidate.findings) || !Array.isArray(candidate.fixTasks)) return undefined;
if (!Array.isArray(candidate.gateViolations)) return undefined;

return {
version: 1,
phase: candidate.phase,
prdPath: candidate.prdPath,
tasks: candidate.tasks,
activeTaskId: candidate.activeTaskId,
findings: candidate.findings,
fixTasks: candidate.fixTasks,
latestCheckPassed: Boolean(candidate.latestCheckPassed),
gateViolations: candidate.gateViolations,
lastUpdatedAt: typeof candidate.lastUpdatedAt === "number" ? candidate.lastUpdatedAt : now(),
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all usages of sanitizePersistedState
rg -n 'sanitizePersistedState' --type=ts -B2 -A5

Repository: daydreamsai/pi-mono

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Find the workflow-state file
fd 'workflow-state' -type f

# Also check the .pi/extensions directory structure
echo "=== Directory structure ==="
find .pi/extensions -type f -name "*.ts" 2>/dev/null | head -20

# Search for the function in all files
echo "=== Searching for sanitizePersistedState in all files ==="
rg 'sanitizePersistedState' -i

Repository: daydreamsai/pi-mono

Length of output: 782


🏁 Script executed:

#!/bin/bash
# Read the workflow-state.ts file
cat -n .pi/extensions/autonomous-loop/workflow-state.ts

Repository: daydreamsai/pi-mono

Length of output: 10053


🏁 Script executed:

#!/bin/bash
# Search for sanitizePersistedState usage
rg 'sanitizePersistedState' -B3 -A8

Repository: daydreamsai/pi-mono

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for sanitizePersistedState in all files with broader patterns
grep -r 'sanitizePersistedState' --include="*.ts" --include="*.js" --include="*.tsx" --include="*.jsx" -B2 -A5

# Also check the index file to see what's exported
echo "=== Checking index.ts ==="
cat -n .pi/extensions/autonomous-loop/index.ts

# Check hooks.ts
echo "=== Checking hooks.ts ==="
cat -n .pi/extensions/autonomous-loop/hooks.ts

Repository: daydreamsai/pi-mono

Length of output: 32709


🏁 Script executed:

#!/bin/bash
# Read review-engine.ts to check validation of findings and fixTasks
cat -n .pi/extensions/autonomous-loop/review-engine.ts | head -150

Repository: daydreamsai/pi-mono

Length of output: 5130


🏁 Script executed:

#!/bin/bash
# Check for any other usage patterns or validation
grep -n "unresolvedBlockingFindings" .pi/extensions/autonomous-loop/*.ts

# Also look at the function definition
grep -A10 "export function unresolvedBlockingFindings" .pi/extensions/autonomous-loop/review-engine.ts

Repository: daydreamsai/pi-mono

Length of output: 1113


sanitizePersistedState performs shallow validation on nested arrays; callers access properties without full validation.

The function validates that tasks, findings, fixTasks, and gateViolations are arrays but doesn't validate their element shapes. Callers like unresolvedBlockingFindings() (line 165 in review-engine.ts) safely use && operators for property access, but code like line 407 in hooks.ts (finding.fileRefs.length) assumes properties exist, risking runtime errors if persisted data contains malformed elements.

Consider adding basic element shape validation or adding null/undefined checks before accessing nested properties.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pi/extensions/autonomous-loop/workflow-state.ts around lines 290 - 311,
sanitizePersistedState currently only checks that tasks, findings, fixTasks, and
gateViolations are arrays but not the shapes of their elements, which lets
malformed persisted items cause crashes when callers (e.g.,
unresolvedBlockingFindings in review-engine.ts or hooks.ts where code accesses
finding.fileRefs.length) assume properties exist; update sanitizePersistedState
to validate element shapes (ensure each task/finding/fixTask/gateViolation is an
object and has the minimal required fields and types you expect—e.g., findings
have id and fileRefs as an array, tasks have id/status, fixTasks have
id/status—and either coerce missing fields to safe defaults or filter out
invalid entries) and return a sanitized array, or alternatively ensure all
strong call sites also guard (wrap accesses like finding.fileRefs.length with
existence checks), referencing sanitizePersistedState,
unresolvedBlockingFindings, and the hooks usage to guide where to apply the
guards.

#!/usr/bin/env bash
# Test: README must contain Autonomous Loop documentation

README="/Users/os/conductor/workspaces/pi-mono-v4/hangzhou-v1/README.md"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded absolute path breaks portability.

The script uses an absolute path specific to a developer's machine, which will fail in CI/CD environments and on other contributors' machines.

🐛 Proposed fix using script-relative path
-README="/Users/os/conductor/workspaces/pi-mono-v4/hangzhou-v1/README.md"
+# Get the directory where this script is located and find README relative to repo root
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+README="${SCRIPT_DIR}/README.md"

Or if the script should be run from the repository root:

-README="/Users/os/conductor/workspaces/pi-mono-v4/hangzhou-v1/README.md"
+README="README.md"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
README="/Users/os/conductor/workspaces/pi-mono-v4/hangzhou-v1/README.md"
#!/bin/bash
# Get the directory where this script is located and find README relative to repo root
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
README="${SCRIPT_DIR}/README.md"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-readme-autonomous-loop.sh` at line 4, The README variable is set to a
hardcoded absolute path which breaks portability; update the assignment of
README in test-readme-autonomous-loop.sh to compute the path dynamically (either
relative to the script location using the script's directory resolution logic
such as dirname "$0" or relative to the repository root determined via git
rev-parse --show-toplevel) so the script works on other machines/CI; locate the
README variable assignment and replace the literal path with a computed path
using those methods and keep the rest of the script unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant