Ralph/prd 141 platform reliability by AutomatosAI · Pull Request #391 · AutomatosAI/automatos-ai

AutomatosAI · 2026-05-28T23:12:13Z

Summary by CodeRabbit

New Features
- Added tool execution signal recording to track success/failure patterns and enable batch processing of routing signals.
- Implemented failure-tracking edges to detect when tools fail after other tools are used.
- Enhanced tool routing to separately apply positive and negative signals in recommendation scoring.
Chores
- Added comprehensive test coverage for new signal recorder and edge builder functionality.

…017) Split _query_affinities() to return explicit (positive_boosts, negative_penalties) dicts instead of a single netted value: succeeds_for_intent and agent_prefers feed positive_boosts; fails_for_intent feeds negative_penalties as a positive magnitude. _expand_with_graph now scores edge chains as cosine*edge_confidence + boost - penalty, so a tool that historically fails for an intent ranks lower. Behaviour was previously netted implicitly; this makes the negative signal explicit and independently testable.

…141 US-018) Record risky tool transitions in the nightly edge build: when tool A succeeds and a tool B within the next 2 steps of the same session errors, emit a failed_after(A, B) edge. Tracks both the failure count and the total (A-succeeded, B-within-2) co-occurrences so confidence is the Wilson lower bound of the failure RATE (failed/total), reusing the existing wilson_lower_bound() rather than a raw count. - _compute_failed_after_edges(): same session grouping/windowing as used_after; A must have succeeded; self-edges skipped; returns {(from,to,ws,agent): (failed, total)} only for pairs with >=1 failure. - _upsert_failed_after_edges(): writes edge_type='failed_after' to the same tool_routing_edges table. uq_tre_full_key includes edge_type, so these coexist with used_after rows (no migration needed; edge_type is String(50)). - Extracted _upsert_edge_row() shared by used_after and failed_after upserts. - EdgeBuildSummary.failed_edges_built + build_edges() wiring. GraphRouter._query_edges only follows edge_type=='used_after', so failed_after edges are recorded for analysis/de-ranking but NEVER expanded into chains; test_failed_after_edge_not_expanded guards this at the DB-filter layer. Also removes a dead, harmful MagicMock of core.database.base from test_prd139_edge_builder.py: it built Mock-based ToolRoutingEdge/Affinity classes that corrupted sibling tests sharing the process (the real cause of the only cross-file failures). The mock never helped — that file already requires DB creds to collect, and real Base imports cleanly. Tests: 86 passed across the routing/graph unit corpus (test_prd139_edge_builder + test_graph_router{,_negative} + us014/us015 + tool_routing_models). py_compile OK.

…D-141 US-019) Fold tool success/failure outcomes into the tool-routing graph in real time via an in-process asyncio.Queue drained by a single long-lived background task. record() is a non-blocking put_nowait on the hot path; DB access happens only inside the drain loop, ONE session per flushed batch. This replaces the original draft's per-call asyncio.ensure_future, which opened a new DB session on every tool call and exhausted the connection pool under load. - config: 4 opt-in settings (recorder off by default, batch size, flush interval, queue maxsize) - signal_recorder: aggregate batch -> used_after/failed_after edges + agent_prefers/fails_for_intent affinities. NULL-safe upsert (UPDATE ... IS NOT DISTINCT FROM -> INSERT-if-rowcount-0) because the unique constraints lack NULLS NOT DISTINCT and intent_cluster_id is always NULL. Recorder increments sample_count/weight on edges but never overwrites confidence; nightly edge_builder remains the authoritative Wilson recompute. - tool_router: non-blocking enqueue at both the success path and the except block; failures in the recorder can never break a tool call. 7 new tests (12 in file, 72 across the routing corpus) green.

coderabbitai · 2026-05-28T23:12:26Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65448c88-6b87-4840-af38-e17b56028321

📥 Commits

Reviewing files that changed from the base of the PR and between 25152a3 and 310c398.

📒 Files selected for processing (7)

orchestrator/config.py
orchestrator/core/services/edge_builder.py
orchestrator/modules/tools/discovery/graph_router.py
orchestrator/modules/tools/discovery/signal_recorder.py
orchestrator/modules/tools/tool_router.py
orchestrator/tests/test_graph_router_negative.py
orchestrator/tests/test_prd139_edge_builder.py

📝 Walkthrough

Walkthrough

This PR introduces an opt-in batched tool execution signal recorder that tracks routing behavior and updates edge/affinity telemetry incrementally. It adds a new failed_after edge type to complement used_after, refactors affinity scoring to separately apply positive boosts and negative penalties, and integrates signal recording into the tool execution path with comprehensive test coverage.

Changes

Tool Execution Signal Recorder & Routing Feedback

Layer / File(s)	Summary
Configuration and Signal Definitions `orchestrator/config.py`, `orchestrator/modules/tools/discovery/signal_recorder.py`	Environment variables for recorder enable/disable, batch size, flush interval, and queue size. `ToolSignal` dataclass and `_wilson` helper for confidence scoring.
Failed-After Edge Detection `orchestrator/core/services/edge_builder.py`	`_compute_failed_after_edges` scans telemetry windows for action pairs where a predecessor succeeds and successor fails within 2 steps, tracking failure and co-occurrence counts per scope.
Failed-After Edge Persistence & Reporting `orchestrator/core/services/edge_builder.py`	New `_upsert_edge_row` shared ON-CONFLICT upsert primitive for multi-type edges; refactored `_upsert_edges` and new `_upsert_failed_after_edges` using Wilson-based confidence. Updated `EdgeBuildSummary.failed_edges_built` counter and completion logging.
Affinity Scoring Refactor: Positive vs. Negative `orchestrator/modules/tools/discovery/graph_router.py`	`_query_affinities` returns `(positive_boosts, negative_penalties)` separately; chain scoring applies `cosine * edge_confidence + boost - penalty` to allow independent weighting of success-based affinity boosts vs. failure-based penalties.
Signal Recorder: Batching, Aggregation & Upserts `orchestrator/modules/tools/discovery/signal_recorder.py`	Non-blocking `record()` enqueue into bounded `asyncio.Queue`, background drain loop collecting batches by size and flush interval, batch aggregation collapsing duplicates, and null-safe incremental DB upserts via single session per flush.
Tool Router: Signal Recording Integration `orchestrator/modules/tools/tool_router.py`	`execute_and_format` emits `ToolSignal` telemetry on both success and failure paths; `_record_tool_signal` extracts prior action from context and enqueues via recorder singleton, swallowing exceptions to isolate from tool hot path.
Graph Router Regression Tests: Affinity & Edge Filtering `orchestrator/tests/test_graph_router_negative.py`	Synthetic module loading with stubbed dependencies; validates affinity positive/negative separation, penalty-based ranking reduction, edge expansion filtering to exclude `failed_after` edges, and signal recorder SQL behavior with DB mocking.
Signal Recorder Tests: Batching, Aggregation & Persistence `orchestrator/tests/test_graph_router_negative.py`	SQL-capturing DB fakes; validates incremental edge/affinity INSERTs, UPDATE upserts collapsing duplicates, single-session enforcement, batch aggregation correctness, and feature-flag and event-loop conditional behavior.
Edge Builder Tests: Failed-After & Wilson Scoring `orchestrator/tests/test_prd139_edge_builder.py`	Test fixture cleanup to use real model definitions; Wilson lower-bound validation; `TestFailedAfterEdges` suite covering edge detection: success-before-failure requirement, 2-step window, self-edge exclusion, session isolation, and co-occurrence accumulation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A signal soars through bounded queues,
failed_after edges, positive news!
Affinity splits with grace and care—
boosts and penalties, balanced fair.
Telemetry batched, confidence bright,
routing feedback shines light.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ralph/prd-141-platform-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Gerard161-Site added 3 commits May 28, 2026 23:29

AutomatosAI merged commit b0a9b85 into main May 28, 2026
1 of 2 checks passed

coderabbitai Bot mentioned this pull request Jun 9, 2026

PRD-142 Wave 4 — Self-Learning / HARNESS (backend) #427

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ralph/prd 141 platform reliability#391

Ralph/prd 141 platform reliability#391
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability

AutomatosAI commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AutomatosAI commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AutomatosAI commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading