Ralph/prd 141 platform reliability#394
Conversation
…S-024)
risk>=4 prescriptions already queue as board tasks; now they also notify a
human (best-effort) via the existing send_workspace_notification, with
/approve {rx_id} / /reject {rx_id} instructions, and record an 'escalated'
changelog entry. Queued board tasks gain an rx:{rx_id} tag so US-025 can
resolve a task back to its prescription. Escalation is gated on the workspace
having a channel_connections row — detected by row presence (matching how
channels.sender resolves a channel) rather than the ambiguous status column.
…authz (US-025) handle_harness_command applies or rejects a queued high-risk HARNESS prescription from a chat command. Gated behind HARNESS_SELF_MANAGEMENT_ENABLED and a non-negotiable workspace-ADMIN check (active owner/admin member) enforced BEFORE any state read or write — an unauthorized caller changes nothing. - /approve: tag-lookup the rx board task, apply now via the existing _auto_apply_prescription, mark the task done, and record it in the shared US-021 ledger (idempotent — never double-applies or re-applies on a tick). - /reject: flip the board task to 'rejected' via the ORM so _get_rejected_signatures suppresses the same prescription on future ticks. - Fail-closed identity coercion (_caller_user_id): rejects non-dict, missing, bool, non-int, and non-positive user ids deterministically rather than relying on DB-driver coercion of caller input. - Audit logging on both success paths + approver user_id in the ledger entry. 7 unit tests cover approve, reject, unknown rx, idempotent re-approve, unauthorized caller (no mutation), malformed identity shapes, and flag-off no-op. Security-auditor reviewed the authz boundary: sound, no CRITICAL/HIGH. Live channel wiring is deferred to the US-026 gate.
…cription (PRD-141 US-026 pre-gate)
Phase 5 task-consumption was built and tested entirely against in-memory fakes,
which hid four real-executor integration bugs that would only bite once the
HARNESS_SELF_MANAGEMENT_ENABLED canary turns the path on:
- C1/H1: harness writes routed through executor.execute("workspace_write_file",
...), but that is an agent tool primitive, NOT a registered platform action —
execute() returned an "unknown action" error with no raise, so the idempotency
ledger (applied_tasks.json) and US-022 baseline never persisted, and the
try/except recorded the silent failure as "ok". Route writes directly to the
workspace volume via a shared _write_workspace_file helper (mirrors the proven
_write_last_run), so reads and writes hit one store and a real I/O error raises
and is recorded. _write_applied_tasks is now sync (no executor param).
- C2: list_board_tasks (the platform_list_tasks handler) dropped `tags` and
`description` from its projection and ignored the `tags` filter param. rx_id
lives only in tags (rx:{id}) and the prescription body lives in description,
so _find_task_by_rx and _parse_harness_task both broke against the real handler.
Add the JSONB `tags` containment filter and surface both fields.
- H2: a /reject keeps the board task's harness/rx tags, so it still surfaced in
the tag-filtered list and a later /approve could resurrect it. Refuse /approve
on a task whose status is already "rejected".
Tests: ledger now asserts a real on-disk round-trip under tmp_path; new
test_board_task_handlers.py proves the projection surfaces tags/description; new
guard test covers the rejected-then-approve path. 34 passing. The JSONB tags
FILTER and the live channel→handler wiring remain for the US-026 canary gate.
📝 WalkthroughWalkthroughThis PR implements HARNESS self-management command handling ( ChangesHARNESS Self-Management Commands
Sequence Diagram(s)sequenceDiagram
participant Caller as Workspace User
participant Handler as handle_harness_command
participant AuthZ as _caller_is_workspace_admin
participant Lookup as _find_task_by_rx
participant Apply as _auto_apply_prescription
participant DB as BoardTask ORM
Caller->>Handler: /approve or /reject + rx_id
Handler->>Handler: check HARNESS_SELF_MANAGEMENT_ENABLED
Handler->>AuthZ: validate caller is owner/admin
AuthZ-->>Handler: authorization result
Handler->>Lookup: search queued tasks by harness + rx_id
Lookup-->>Handler: task or not_found
alt /approve
Handler->>Apply: resolve and apply prescription
Apply->>DB: set status=done after success
else /reject
Handler->>DB: set status=rejected + block metadata
end
DB-->>Handler: commit result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
orchestrator/api/harness_commands.py (1)
162-162: ⚡ Quick winNarrow the task lookup at the DB level by passing both tags.
list_board_tasksappliesmin(limit, 50)with a default of20, ordered bycreated_at desc. Filtering by onlyharnessand then matchingrx:{rx_id}in Python means that in a workspace with more than 20 harness-tagged tasks, the target prescription can fall outside the returned window and produce a spurious "No pending HARNESS change found". Passing both tags leverages the multi-tag containment filter added in this PR so the query returns exactly the matching task.♻️ Pass the rx tag into the query
- list_result = await executor.execute("platform_list_tasks", {"tags": ["harness"]}) + list_result = await executor.execute( + "platform_list_tasks", {"tags": ["harness", f"rx:{rx_id}"]} + )
_find_task_by_rxcan stay as a defensive second check.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@orchestrator/api/harness_commands.py` at line 162, The DB query currently calls executor.execute("platform_list_tasks", {"tags": ["harness"]}) which only filters by the harness tag; change the payload to include both tags (e.g. {"tags": ["harness", f"rx:{rx_id}"]}) so the server-side multi-tag containment filter narrows results to the exact prescription; keep the existing _find_task_by_rx defensive check afterwards but update the call site where list_result is assigned in harness_commands.py to pass the extra rx tag.orchestrator/services/harness_service.py (1)
835-841: 💤 Low valueType hint
Sessionis undefined — use the imported path or import the type.The
db: "Session"annotation references a type that isn't imported. While this works at runtime (the string is never resolved), static analysis tools flag it as undefined. Either import the type for annotation purposes or use the full module path.+from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from sqlalchemy.orm import SessionOr change the annotation to the fully-qualified string:
- def _maybe_escalate( - self, - db: "Session", + def _maybe_escalate( + self, + db: "sqlalchemy.orm.Session",Also applies to: 879-879
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@orchestrator/services/harness_service.py` around lines 835 - 841, The type annotation db: "Session" in _maybe_escalate is using an undefined string type which static checkers flag; fix by replacing the quoted annotation with the actual imported type (e.g., import Session from sqlalchemy.orm or the correct module used elsewhere) or change it to the fully-qualified module path in the annotation (e.g., sqlalchemy.orm.Session) and apply the same fix to the other occurrence noted (the other function using "Session"); update imports accordingly so static analyzers recognize the type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@orchestrator/api/harness_commands.py`:
- Line 162: The DB query currently calls executor.execute("platform_list_tasks",
{"tags": ["harness"]}) which only filters by the harness tag; change the payload
to include both tags (e.g. {"tags": ["harness", f"rx:{rx_id}"]}) so the
server-side multi-tag containment filter narrows results to the exact
prescription; keep the existing _find_task_by_rx defensive check afterwards but
update the call site where list_result is assigned in harness_commands.py to
pass the extra rx tag.
In `@orchestrator/services/harness_service.py`:
- Around line 835-841: The type annotation db: "Session" in _maybe_escalate is
using an undefined string type which static checkers flag; fix by replacing the
quoted annotation with the actual imported type (e.g., import Session from
sqlalchemy.orm or the correct module used elsewhere) or change it to the
fully-qualified module path in the annotation (e.g., sqlalchemy.orm.Session) and
apply the same fix to the other occurrence noted (the other function using
"Session"); update imports accordingly so static analyzers recognize the type.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 62a7ac0e-0969-4414-b852-d539b04374ba
📒 Files selected for processing (6)
orchestrator/api/harness_commands.pyorchestrator/modules/tools/discovery/handlers_board_tasks.pyorchestrator/services/harness_service.pyorchestrator/tests/test_board_task_handlers.pyorchestrator/tests/test_harness_commands.pyorchestrator/tests/test_harness_self_management.py
Summary by CodeRabbit
New Features
Tests