Skip to content

Ralph/prd 141 platform reliability#391

Merged
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability
May 28, 2026
Merged

Ralph/prd 141 platform reliability#391
AutomatosAI merged 3 commits into
mainfrom
ralph/prd-141-platform-reliability

Conversation

@AutomatosAI

@AutomatosAI AutomatosAI commented May 28, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Added tool execution signal recording to track success/failure patterns and enable batch processing of routing signals.
    • Implemented failure-tracking edges to detect when tools fail after other tools are used.
    • Enhanced tool routing to separately apply positive and negative signals in recommendation scoring.
  • Chores

    • Added comprehensive test coverage for new signal recorder and edge builder functionality.

Review Change Stack

…017)

Split _query_affinities() to return explicit (positive_boosts,
negative_penalties) dicts instead of a single netted value: succeeds_for_intent
and agent_prefers feed positive_boosts; fails_for_intent feeds
negative_penalties as a positive magnitude. _expand_with_graph now scores edge
chains as cosine*edge_confidence + boost - penalty, so a tool that historically
fails for an intent ranks lower. Behaviour was previously netted implicitly;
this makes the negative signal explicit and independently testable.
…141 US-018)

Record risky tool transitions in the nightly edge build: when tool A
succeeds and a tool B within the next 2 steps of the same session errors,
emit a failed_after(A, B) edge. Tracks both the failure count and the total
(A-succeeded, B-within-2) co-occurrences so confidence is the Wilson lower
bound of the failure RATE (failed/total), reusing the existing
wilson_lower_bound() rather than a raw count.

- _compute_failed_after_edges(): same session grouping/windowing as
  used_after; A must have succeeded; self-edges skipped; returns
  {(from,to,ws,agent): (failed, total)} only for pairs with >=1 failure.
- _upsert_failed_after_edges(): writes edge_type='failed_after' to the same
  tool_routing_edges table. uq_tre_full_key includes edge_type, so these
  coexist with used_after rows (no migration needed; edge_type is String(50)).
- Extracted _upsert_edge_row() shared by used_after and failed_after upserts.
- EdgeBuildSummary.failed_edges_built + build_edges() wiring.

GraphRouter._query_edges only follows edge_type=='used_after', so failed_after
edges are recorded for analysis/de-ranking but NEVER expanded into chains;
test_failed_after_edge_not_expanded guards this at the DB-filter layer.

Also removes a dead, harmful MagicMock of core.database.base from
test_prd139_edge_builder.py: it built Mock-based ToolRoutingEdge/Affinity
classes that corrupted sibling tests sharing the process (the real cause of
the only cross-file failures). The mock never helped — that file already
requires DB creds to collect, and real Base imports cleanly.

Tests: 86 passed across the routing/graph unit corpus
(test_prd139_edge_builder + test_graph_router{,_negative} + us014/us015 +
tool_routing_models). py_compile OK.
…D-141 US-019)

Fold tool success/failure outcomes into the tool-routing graph in real time
via an in-process asyncio.Queue drained by a single long-lived background
task. record() is a non-blocking put_nowait on the hot path; DB access happens
only inside the drain loop, ONE session per flushed batch.

This replaces the original draft's per-call asyncio.ensure_future, which opened
a new DB session on every tool call and exhausted the connection pool under load.

- config: 4 opt-in settings (recorder off by default, batch size, flush
  interval, queue maxsize)
- signal_recorder: aggregate batch -> used_after/failed_after edges +
  agent_prefers/fails_for_intent affinities. NULL-safe upsert
  (UPDATE ... IS NOT DISTINCT FROM -> INSERT-if-rowcount-0) because the unique
  constraints lack NULLS NOT DISTINCT and intent_cluster_id is always NULL.
  Recorder increments sample_count/weight on edges but never overwrites
  confidence; nightly edge_builder remains the authoritative Wilson recompute.
- tool_router: non-blocking enqueue at both the success path and the except
  block; failures in the recorder can never break a tool call.

7 new tests (12 in file, 72 across the routing corpus) green.
@AutomatosAI AutomatosAI merged commit b0a9b85 into main May 28, 2026
1 of 2 checks passed
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65448c88-6b87-4840-af38-e17b56028321

📥 Commits

Reviewing files that changed from the base of the PR and between 25152a3 and 310c398.

📒 Files selected for processing (7)
  • orchestrator/config.py
  • orchestrator/core/services/edge_builder.py
  • orchestrator/modules/tools/discovery/graph_router.py
  • orchestrator/modules/tools/discovery/signal_recorder.py
  • orchestrator/modules/tools/tool_router.py
  • orchestrator/tests/test_graph_router_negative.py
  • orchestrator/tests/test_prd139_edge_builder.py

📝 Walkthrough

Walkthrough

This PR introduces an opt-in batched tool execution signal recorder that tracks routing behavior and updates edge/affinity telemetry incrementally. It adds a new failed_after edge type to complement used_after, refactors affinity scoring to separately apply positive boosts and negative penalties, and integrates signal recording into the tool execution path with comprehensive test coverage.

Changes

Tool Execution Signal Recorder & Routing Feedback

Layer / File(s) Summary
Configuration and Signal Definitions
orchestrator/config.py, orchestrator/modules/tools/discovery/signal_recorder.py
Environment variables for recorder enable/disable, batch size, flush interval, and queue size. ToolSignal dataclass and _wilson helper for confidence scoring.
Failed-After Edge Detection
orchestrator/core/services/edge_builder.py
_compute_failed_after_edges scans telemetry windows for action pairs where a predecessor succeeds and successor fails within 2 steps, tracking failure and co-occurrence counts per scope.
Failed-After Edge Persistence & Reporting
orchestrator/core/services/edge_builder.py
New _upsert_edge_row shared ON-CONFLICT upsert primitive for multi-type edges; refactored _upsert_edges and new _upsert_failed_after_edges using Wilson-based confidence. Updated EdgeBuildSummary.failed_edges_built counter and completion logging.
Affinity Scoring Refactor: Positive vs. Negative
orchestrator/modules/tools/discovery/graph_router.py
_query_affinities returns (positive_boosts, negative_penalties) separately; chain scoring applies cosine * edge_confidence + boost - penalty to allow independent weighting of success-based affinity boosts vs. failure-based penalties.
Signal Recorder: Batching, Aggregation & Upserts
orchestrator/modules/tools/discovery/signal_recorder.py
Non-blocking record() enqueue into bounded asyncio.Queue, background drain loop collecting batches by size and flush interval, batch aggregation collapsing duplicates, and null-safe incremental DB upserts via single session per flush.
Tool Router: Signal Recording Integration
orchestrator/modules/tools/tool_router.py
execute_and_format emits ToolSignal telemetry on both success and failure paths; _record_tool_signal extracts prior action from context and enqueues via recorder singleton, swallowing exceptions to isolate from tool hot path.
Graph Router Regression Tests: Affinity & Edge Filtering
orchestrator/tests/test_graph_router_negative.py
Synthetic module loading with stubbed dependencies; validates affinity positive/negative separation, penalty-based ranking reduction, edge expansion filtering to exclude failed_after edges, and signal recorder SQL behavior with DB mocking.
Signal Recorder Tests: Batching, Aggregation & Persistence
orchestrator/tests/test_graph_router_negative.py
SQL-capturing DB fakes; validates incremental edge/affinity INSERTs, UPDATE upserts collapsing duplicates, single-session enforcement, batch aggregation correctness, and feature-flag and event-loop conditional behavior.
Edge Builder Tests: Failed-After & Wilson Scoring
orchestrator/tests/test_prd139_edge_builder.py
Test fixture cleanup to use real model definitions; Wilson lower-bound validation; TestFailedAfterEdges suite covering edge detection: success-before-failure requirement, 2-step window, self-edge exclusion, session isolation, and co-occurrence accumulation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A signal soars through bounded queues,
failed_after edges, positive news!
Affinity splits with grace and care—
boosts and penalties, balanced fair.
Telemetry batched, confidence bright,
routing feedback shines light.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ralph/prd-141-platform-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants