Ralph/prd 141 platform reliability by AutomatosAI · Pull Request #394 · AutomatosAI/automatos-ai

AutomatosAI · 2026-05-29T08:27:23Z

Summary by CodeRabbit

New Features
- Added command handler to approve or reject HARNESS prescriptions with admin-level authorization requirements
- Added tag-based filtering for board task listings
- High-risk HARNESS prescriptions now escalate and trigger workspace notifications with approval/rejection options
Tests
- Added comprehensive test coverage for HARNESS command handling and prescription management workflows

…S-024) risk>=4 prescriptions already queue as board tasks; now they also notify a human (best-effort) via the existing send_workspace_notification, with /approve {rx_id} / /reject {rx_id} instructions, and record an 'escalated' changelog entry. Queued board tasks gain an rx:{rx_id} tag so US-025 can resolve a task back to its prescription. Escalation is gated on the workspace having a channel_connections row — detected by row presence (matching how channels.sender resolves a channel) rather than the ambiguous status column.

…authz (US-025) handle_harness_command applies or rejects a queued high-risk HARNESS prescription from a chat command. Gated behind HARNESS_SELF_MANAGEMENT_ENABLED and a non-negotiable workspace-ADMIN check (active owner/admin member) enforced BEFORE any state read or write — an unauthorized caller changes nothing. - /approve: tag-lookup the rx board task, apply now via the existing _auto_apply_prescription, mark the task done, and record it in the shared US-021 ledger (idempotent — never double-applies or re-applies on a tick). - /reject: flip the board task to 'rejected' via the ORM so _get_rejected_signatures suppresses the same prescription on future ticks. - Fail-closed identity coercion (_caller_user_id): rejects non-dict, missing, bool, non-int, and non-positive user ids deterministically rather than relying on DB-driver coercion of caller input. - Audit logging on both success paths + approver user_id in the ledger entry. 7 unit tests cover approve, reject, unknown rx, idempotent re-approve, unauthorized caller (no mutation), malformed identity shapes, and flag-off no-op. Security-auditor reviewed the authz boundary: sound, no CRITICAL/HIGH. Live channel wiring is deferred to the US-026 gate.

…cription (PRD-141 US-026 pre-gate) Phase 5 task-consumption was built and tested entirely against in-memory fakes, which hid four real-executor integration bugs that would only bite once the HARNESS_SELF_MANAGEMENT_ENABLED canary turns the path on: - C1/H1: harness writes routed through executor.execute("workspace_write_file", ...), but that is an agent tool primitive, NOT a registered platform action — execute() returned an "unknown action" error with no raise, so the idempotency ledger (applied_tasks.json) and US-022 baseline never persisted, and the try/except recorded the silent failure as "ok". Route writes directly to the workspace volume via a shared _write_workspace_file helper (mirrors the proven _write_last_run), so reads and writes hit one store and a real I/O error raises and is recorded. _write_applied_tasks is now sync (no executor param). - C2: list_board_tasks (the platform_list_tasks handler) dropped `tags` and `description` from its projection and ignored the `tags` filter param. rx_id lives only in tags (rx:{id}) and the prescription body lives in description, so _find_task_by_rx and _parse_harness_task both broke against the real handler. Add the JSONB `tags` containment filter and surface both fields. - H2: a /reject keeps the board task's harness/rx tags, so it still surfaced in the tag-filtered list and a later /approve could resurrect it. Refuse /approve on a task whose status is already "rejected". Tests: ledger now asserts a real on-disk round-trip under tmp_path; new test_board_task_handlers.py proves the projection surfaces tags/description; new guard test covers the rejected-then-approve path. 34 passing. The JSONB tags FILTER and the live channel→handler wiring remain for the US-026 canary gate.

coderabbitai · 2026-05-29T08:27:36Z

📝 Walkthrough

Walkthrough

This PR implements HARNESS self-management command handling (/approve and /reject) for prescriptions queued for human review. It adds workspace-admin authorization boundary, board task discovery by rx_id tags, high-risk escalation notifications, direct filesystem artifact persistence, and comprehensive idempotency via an on-disk ledger.

Changes

HARNESS Self-Management Commands

Layer / File(s)	Summary
Board Task Discovery & Tags Projection `orchestrator/modules/tools/discovery/handlers_board_tasks.py`, `orchestrator/tests/test_board_task_handlers.py`	`list_board_tasks` query now filters by HARNESS tags (containing both `harness` and `rx:{rx_id}`) and returns `tags` and `description` fields in the task projection, enabling command handlers to locate queued tasks for approval/rejection.
HARNESS Service Escalation & File Persistence `orchestrator/services/harness_service.py`, `orchestrator/tests/test_harness_self_management.py`	Phase 4 apply now tags board tasks with `rx:{rx_id}`, records an `escalated` changelog entry, and sends workspace notifications for high-risk queued prescriptions. Phase 5 baseline artifacts and `applied_tasks.json` ledger are written directly to the workspace volume filesystem instead of through the executor, with direct-write helper `_write_workspace_file`.
Command Handler: Authorization & Core Logic `orchestrator/api/harness_commands.py`	New `handle_harness_command` entry point enforces workspace-admin authorization, normalizes `/approve` and `/reject` commands, locates tasks by `rx_id` tag, applies prescriptions via `_auto_apply_prescription` with idempotency ledger updates, or rejects with ORM status and block metadata changes.
Command Handler Tests: Authorization & Behaviors `orchestrator/tests/test_harness_commands.py`	8 test cases validate `/approve` success (executor heartbeat call, status update, ledger write, idempotency), `/reject` success (status change, no executor calls), unknown task failure, already-applied idempotency, rejected-task refusal, non-admin authorization rejection, malformed caller identity rejection, and feature-flag disablement with zero executor calls.
Approved-Task Ledger & Idempotency `orchestrator/tests/test_harness_self_management.py`	Existing self-management tests refactored to use `tmp_path`-backed `WORKSPACE_VOLUME_PATH` for on-disk ledger assertions. New helper `_ledger_path` computes ledger location; tests now assert filesystem persistence, `_read_applied_tasks` recognition of prior application, and that ledger is not written when nothing is applied. Added US-024 escalation tests for telegram notification gating by channel presence, non-escalation for low-risk, and escalation message content validation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Workspace User
  participant Handler as handle_harness_command
  participant AuthZ as _caller_is_workspace_admin
  participant Lookup as _find_task_by_rx
  participant Apply as _auto_apply_prescription
  participant DB as BoardTask ORM
  Caller->>Handler: /approve or /reject + rx_id
  Handler->>Handler: check HARNESS_SELF_MANAGEMENT_ENABLED
  Handler->>AuthZ: validate caller is owner/admin
  AuthZ-->>Handler: authorization result
  Handler->>Lookup: search queued tasks by harness + rx_id
  Lookup-->>Handler: task or not_found
  alt /approve
    Handler->>Apply: resolve and apply prescription
    Apply->>DB: set status=done after success
  else /reject
    Handler->>DB: set status=rejected + block metadata
  end
  DB-->>Handler: commit result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

AutomatosAI/automatos-ai#193: Both PRs modify HARNESS's HarnessService apply behavior around when prescriptions are auto-applied vs queued/reviewable (via harness enable/mode logic in #193 and apply/escalation + applied_tasks/queue handling in the main PR), which directly underpins the main PR's /approve//reject flow for queued tasks.

Poem

🐰 Hop, skip, and approve!
A rabbit's prescription now moves
Through admin's command,
By /approve or /reject hand,
While escalation notifications groove! 🎯

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Ralph/prd 141 platform reliability' is vague and does not clearly convey the main changes. The PR implements HARNESS command handling, board task filtering, and high-risk escalation features, but the title uses generic phrasing ('platform reliability') and includes branch naming artifacts.	Revise the title to be more descriptive and specific to the actual implementation, such as 'Add HARNESS self-management command handler with approval/rejection flow' or 'Implement HARNESS high-risk escalation and admin command processing'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ralph/prd-141-platform-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

orchestrator/api/harness_commands.py (1)
162-162: ⚡ Quick win

Narrow the task lookup at the DB level by passing both tags.

list_board_tasks applies min(limit, 50) with a default of 20, ordered by created_at desc. Filtering by only harness and then matching rx:{rx_id} in Python means that in a workspace with more than 20 harness-tagged tasks, the target prescription can fall outside the returned window and produce a spurious "No pending HARNESS change found". Passing both tags leverages the multi-tag containment filter added in this PR so the query returns exactly the matching task.
♻️ Pass the rx tag into the query
-    list_result = await executor.execute("platform_list_tasks", {"tags": ["harness"]})
+    list_result = await executor.execute(
+        "platform_list_tasks", {"tags": ["harness", f"rx:{rx_id}"]}
+    )
_find_task_by_rx can stay as a defensive second check.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@orchestrator/api/harness_commands.py` at line 162, The DB query currently
calls executor.execute("platform_list_tasks", {"tags": ["harness"]}) which only
filters by the harness tag; change the payload to include both tags (e.g.
{"tags": ["harness", f"rx:{rx_id}"]}) so the server-side multi-tag containment
filter narrows results to the exact prescription; keep the existing
_find_task_by_rx defensive check afterwards but update the call site where
list_result is assigned in harness_commands.py to pass the extra rx tag.
orchestrator/services/harness_service.py (1)
835-841: 💤 Low value

Type hint Session is undefined — use the imported path or import the type.

The db: "Session" annotation references a type that isn't imported. While this works at runtime (the string is never resolved), static analysis tools flag it as undefined. Either import the type for annotation purposes or use the full module path.
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from sqlalchemy.orm import Session
Or change the annotation to the fully-qualified string:
-    def _maybe_escalate(
-        self,
-        db: "Session",
+    def _maybe_escalate(
+        self,
+        db: "sqlalchemy.orm.Session",
Also applies to: 879-879
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@orchestrator/services/harness_service.py` around lines 835 - 841, The type
annotation db: "Session" in _maybe_escalate is using an undefined string type
which static checkers flag; fix by replacing the quoted annotation with the
actual imported type (e.g., import Session from sqlalchemy.orm or the correct
module used elsewhere) or change it to the fully-qualified module path in the
annotation (e.g., sqlalchemy.orm.Session) and apply the same fix to the other
occurrence noted (the other function using "Session"); update imports
accordingly so static analyzers recognize the type.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@orchestrator/api/harness_commands.py`:
- Line 162: The DB query currently calls executor.execute("platform_list_tasks",
{"tags": ["harness"]}) which only filters by the harness tag; change the payload
to include both tags (e.g. {"tags": ["harness", f"rx:{rx_id}"]}) so the
server-side multi-tag containment filter narrows results to the exact
prescription; keep the existing _find_task_by_rx defensive check afterwards but
update the call site where list_result is assigned in harness_commands.py to
pass the extra rx tag.

In `@orchestrator/services/harness_service.py`:
- Around line 835-841: The type annotation db: "Session" in _maybe_escalate is
using an undefined string type which static checkers flag; fix by replacing the
quoted annotation with the actual imported type (e.g., import Session from
sqlalchemy.orm or the correct module used elsewhere) or change it to the
fully-qualified module path in the annotation (e.g., sqlalchemy.orm.Session) and
apply the same fix to the other occurrence noted (the other function using
"Session"); update imports accordingly so static analyzers recognize the type.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62a7ac0e-0969-4414-b852-d539b04374ba

📥 Commits

Reviewing files that changed from the base of the PR and between f7dc94c and 9266e02.

📒 Files selected for processing (6)

orchestrator/api/harness_commands.py
orchestrator/modules/tools/discovery/handlers_board_tasks.py
orchestrator/services/harness_service.py
orchestrator/tests/test_board_task_handlers.py
orchestrator/tests/test_harness_commands.py
orchestrator/tests/test_harness_self_management.py

Gerard161-Site added 3 commits May 29, 2026 01:20

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

AutomatosAI merged commit 51f2f41 into main May 29, 2026
4 checks passed

AutomatosAI deleted the ralph/prd-141-platform-reliability branch May 29, 2026 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ralph/prd 141 platform reliability#394

Ralph/prd 141 platform reliability#394
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability

AutomatosAI commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AutomatosAI commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AutomatosAI commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading