Skip to content

Ralph/prd 141 platform reliability#394

Merged
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability
May 29, 2026
Merged

Ralph/prd 141 platform reliability#394
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability

Conversation

@AutomatosAI
Copy link
Copy Markdown
Owner

@AutomatosAI AutomatosAI commented May 29, 2026

Summary by CodeRabbit

  • New Features

    • Added command handler to approve or reject HARNESS prescriptions with admin-level authorization requirements
    • Added tag-based filtering for board task listings
    • High-risk HARNESS prescriptions now escalate and trigger workspace notifications with approval/rejection options
  • Tests

    • Added comprehensive test coverage for HARNESS command handling and prescription management workflows

Review Change Stack

…S-024)

risk>=4 prescriptions already queue as board tasks; now they also notify a
human (best-effort) via the existing send_workspace_notification, with
/approve {rx_id} / /reject {rx_id} instructions, and record an 'escalated'
changelog entry. Queued board tasks gain an rx:{rx_id} tag so US-025 can
resolve a task back to its prescription. Escalation is gated on the workspace
having a channel_connections row — detected by row presence (matching how
channels.sender resolves a channel) rather than the ambiguous status column.
…authz (US-025)

handle_harness_command applies or rejects a queued high-risk HARNESS
prescription from a chat command. Gated behind HARNESS_SELF_MANAGEMENT_ENABLED
and a non-negotiable workspace-ADMIN check (active owner/admin member) enforced
BEFORE any state read or write — an unauthorized caller changes nothing.

- /approve: tag-lookup the rx board task, apply now via the existing
  _auto_apply_prescription, mark the task done, and record it in the shared
  US-021 ledger (idempotent — never double-applies or re-applies on a tick).
- /reject: flip the board task to 'rejected' via the ORM so
  _get_rejected_signatures suppresses the same prescription on future ticks.
- Fail-closed identity coercion (_caller_user_id): rejects non-dict, missing,
  bool, non-int, and non-positive user ids deterministically rather than
  relying on DB-driver coercion of caller input.
- Audit logging on both success paths + approver user_id in the ledger entry.

7 unit tests cover approve, reject, unknown rx, idempotent re-approve,
unauthorized caller (no mutation), malformed identity shapes, and flag-off
no-op. Security-auditor reviewed the authz boundary: sound, no CRITICAL/HIGH.
Live channel wiring is deferred to the US-026 gate.
…cription (PRD-141 US-026 pre-gate)

Phase 5 task-consumption was built and tested entirely against in-memory fakes,
which hid four real-executor integration bugs that would only bite once the
HARNESS_SELF_MANAGEMENT_ENABLED canary turns the path on:

- C1/H1: harness writes routed through executor.execute("workspace_write_file",
  ...), but that is an agent tool primitive, NOT a registered platform action —
  execute() returned an "unknown action" error with no raise, so the idempotency
  ledger (applied_tasks.json) and US-022 baseline never persisted, and the
  try/except recorded the silent failure as "ok". Route writes directly to the
  workspace volume via a shared _write_workspace_file helper (mirrors the proven
  _write_last_run), so reads and writes hit one store and a real I/O error raises
  and is recorded. _write_applied_tasks is now sync (no executor param).

- C2: list_board_tasks (the platform_list_tasks handler) dropped `tags` and
  `description` from its projection and ignored the `tags` filter param. rx_id
  lives only in tags (rx:{id}) and the prescription body lives in description,
  so _find_task_by_rx and _parse_harness_task both broke against the real handler.
  Add the JSONB `tags` containment filter and surface both fields.

- H2: a /reject keeps the board task's harness/rx tags, so it still surfaced in
  the tag-filtered list and a later /approve could resurrect it. Refuse /approve
  on a task whose status is already "rejected".

Tests: ledger now asserts a real on-disk round-trip under tmp_path; new
test_board_task_handlers.py proves the projection surfaces tags/description; new
guard test covers the rejected-then-approve path. 34 passing. The JSONB tags
FILTER and the live channel→handler wiring remain for the US-026 canary gate.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR implements HARNESS self-management command handling (/approve and /reject) for prescriptions queued for human review. It adds workspace-admin authorization boundary, board task discovery by rx_id tags, high-risk escalation notifications, direct filesystem artifact persistence, and comprehensive idempotency via an on-disk ledger.

Changes

HARNESS Self-Management Commands

Layer / File(s) Summary
Board Task Discovery & Tags Projection
orchestrator/modules/tools/discovery/handlers_board_tasks.py, orchestrator/tests/test_board_task_handlers.py
list_board_tasks query now filters by HARNESS tags (containing both harness and rx:{rx_id}) and returns tags and description fields in the task projection, enabling command handlers to locate queued tasks for approval/rejection.
HARNESS Service Escalation & File Persistence
orchestrator/services/harness_service.py, orchestrator/tests/test_harness_self_management.py
Phase 4 apply now tags board tasks with rx:{rx_id}, records an escalated changelog entry, and sends workspace notifications for high-risk queued prescriptions. Phase 5 baseline artifacts and applied_tasks.json ledger are written directly to the workspace volume filesystem instead of through the executor, with direct-write helper _write_workspace_file.
Command Handler: Authorization & Core Logic
orchestrator/api/harness_commands.py
New handle_harness_command entry point enforces workspace-admin authorization, normalizes /approve and /reject commands, locates tasks by rx_id tag, applies prescriptions via _auto_apply_prescription with idempotency ledger updates, or rejects with ORM status and block metadata changes.
Command Handler Tests: Authorization & Behaviors
orchestrator/tests/test_harness_commands.py
8 test cases validate /approve success (executor heartbeat call, status update, ledger write, idempotency), /reject success (status change, no executor calls), unknown task failure, already-applied idempotency, rejected-task refusal, non-admin authorization rejection, malformed caller identity rejection, and feature-flag disablement with zero executor calls.
Approved-Task Ledger & Idempotency
orchestrator/tests/test_harness_self_management.py
Existing self-management tests refactored to use tmp_path-backed WORKSPACE_VOLUME_PATH for on-disk ledger assertions. New helper _ledger_path computes ledger location; tests now assert filesystem persistence, _read_applied_tasks recognition of prior application, and that ledger is not written when nothing is applied. Added US-024 escalation tests for telegram notification gating by channel presence, non-escalation for low-risk, and escalation message content validation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Workspace User
  participant Handler as handle_harness_command
  participant AuthZ as _caller_is_workspace_admin
  participant Lookup as _find_task_by_rx
  participant Apply as _auto_apply_prescription
  participant DB as BoardTask ORM
  Caller->>Handler: /approve or /reject + rx_id
  Handler->>Handler: check HARNESS_SELF_MANAGEMENT_ENABLED
  Handler->>AuthZ: validate caller is owner/admin
  AuthZ-->>Handler: authorization result
  Handler->>Lookup: search queued tasks by harness + rx_id
  Lookup-->>Handler: task or not_found
  alt /approve
    Handler->>Apply: resolve and apply prescription
    Apply->>DB: set status=done after success
  else /reject
    Handler->>DB: set status=rejected + block metadata
  end
  DB-->>Handler: commit result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AutomatosAI/automatos-ai#193: Both PRs modify HARNESS's HarnessService apply behavior around when prescriptions are auto-applied vs queued/reviewable (via harness enable/mode logic in #193 and apply/escalation + applied_tasks/queue handling in the main PR), which directly underpins the main PR's /approve//reject flow for queued tasks.

Poem

🐰 Hop, skip, and approve!
A rabbit's prescription now moves
Through admin's command,
By /approve or /reject hand,
While escalation notifications groove! 🎯

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Ralph/prd 141 platform reliability' is vague and does not clearly convey the main changes. The PR implements HARNESS command handling, board task filtering, and high-risk escalation features, but the title uses generic phrasing ('platform reliability') and includes branch naming artifacts. Revise the title to be more descriptive and specific to the actual implementation, such as 'Add HARNESS self-management command handler with approval/rejection flow' or 'Implement HARNESS high-risk escalation and admin command processing'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ralph/prd-141-platform-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
orchestrator/api/harness_commands.py (1)

162-162: ⚡ Quick win

Narrow the task lookup at the DB level by passing both tags.

list_board_tasks applies min(limit, 50) with a default of 20, ordered by created_at desc. Filtering by only harness and then matching rx:{rx_id} in Python means that in a workspace with more than 20 harness-tagged tasks, the target prescription can fall outside the returned window and produce a spurious "No pending HARNESS change found". Passing both tags leverages the multi-tag containment filter added in this PR so the query returns exactly the matching task.

♻️ Pass the rx tag into the query
-    list_result = await executor.execute("platform_list_tasks", {"tags": ["harness"]})
+    list_result = await executor.execute(
+        "platform_list_tasks", {"tags": ["harness", f"rx:{rx_id}"]}
+    )

_find_task_by_rx can stay as a defensive second check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@orchestrator/api/harness_commands.py` at line 162, The DB query currently
calls executor.execute("platform_list_tasks", {"tags": ["harness"]}) which only
filters by the harness tag; change the payload to include both tags (e.g.
{"tags": ["harness", f"rx:{rx_id}"]}) so the server-side multi-tag containment
filter narrows results to the exact prescription; keep the existing
_find_task_by_rx defensive check afterwards but update the call site where
list_result is assigned in harness_commands.py to pass the extra rx tag.
orchestrator/services/harness_service.py (1)

835-841: 💤 Low value

Type hint Session is undefined — use the imported path or import the type.

The db: "Session" annotation references a type that isn't imported. While this works at runtime (the string is never resolved), static analysis tools flag it as undefined. Either import the type for annotation purposes or use the full module path.

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from sqlalchemy.orm import Session

Or change the annotation to the fully-qualified string:

-    def _maybe_escalate(
-        self,
-        db: "Session",
+    def _maybe_escalate(
+        self,
+        db: "sqlalchemy.orm.Session",

Also applies to: 879-879

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@orchestrator/services/harness_service.py` around lines 835 - 841, The type
annotation db: "Session" in _maybe_escalate is using an undefined string type
which static checkers flag; fix by replacing the quoted annotation with the
actual imported type (e.g., import Session from sqlalchemy.orm or the correct
module used elsewhere) or change it to the fully-qualified module path in the
annotation (e.g., sqlalchemy.orm.Session) and apply the same fix to the other
occurrence noted (the other function using "Session"); update imports
accordingly so static analyzers recognize the type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@orchestrator/api/harness_commands.py`:
- Line 162: The DB query currently calls executor.execute("platform_list_tasks",
{"tags": ["harness"]}) which only filters by the harness tag; change the payload
to include both tags (e.g. {"tags": ["harness", f"rx:{rx_id}"]}) so the
server-side multi-tag containment filter narrows results to the exact
prescription; keep the existing _find_task_by_rx defensive check afterwards but
update the call site where list_result is assigned in harness_commands.py to
pass the extra rx tag.

In `@orchestrator/services/harness_service.py`:
- Around line 835-841: The type annotation db: "Session" in _maybe_escalate is
using an undefined string type which static checkers flag; fix by replacing the
quoted annotation with the actual imported type (e.g., import Session from
sqlalchemy.orm or the correct module used elsewhere) or change it to the
fully-qualified module path in the annotation (e.g., sqlalchemy.orm.Session) and
apply the same fix to the other occurrence noted (the other function using
"Session"); update imports accordingly so static analyzers recognize the type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62a7ac0e-0969-4414-b852-d539b04374ba

📥 Commits

Reviewing files that changed from the base of the PR and between f7dc94c and 9266e02.

📒 Files selected for processing (6)
  • orchestrator/api/harness_commands.py
  • orchestrator/modules/tools/discovery/handlers_board_tasks.py
  • orchestrator/services/harness_service.py
  • orchestrator/tests/test_board_task_handlers.py
  • orchestrator/tests/test_harness_commands.py
  • orchestrator/tests/test_harness_self_management.py

@AutomatosAI AutomatosAI merged commit 51f2f41 into main May 29, 2026
4 checks passed
@AutomatosAI AutomatosAI deleted the ralph/prd-141-platform-reliability branch May 29, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants