Ralph/prd 141 platform reliability#391
Conversation
…017) Split _query_affinities() to return explicit (positive_boosts, negative_penalties) dicts instead of a single netted value: succeeds_for_intent and agent_prefers feed positive_boosts; fails_for_intent feeds negative_penalties as a positive magnitude. _expand_with_graph now scores edge chains as cosine*edge_confidence + boost - penalty, so a tool that historically fails for an intent ranks lower. Behaviour was previously netted implicitly; this makes the negative signal explicit and independently testable.
…141 US-018)
Record risky tool transitions in the nightly edge build: when tool A
succeeds and a tool B within the next 2 steps of the same session errors,
emit a failed_after(A, B) edge. Tracks both the failure count and the total
(A-succeeded, B-within-2) co-occurrences so confidence is the Wilson lower
bound of the failure RATE (failed/total), reusing the existing
wilson_lower_bound() rather than a raw count.
- _compute_failed_after_edges(): same session grouping/windowing as
used_after; A must have succeeded; self-edges skipped; returns
{(from,to,ws,agent): (failed, total)} only for pairs with >=1 failure.
- _upsert_failed_after_edges(): writes edge_type='failed_after' to the same
tool_routing_edges table. uq_tre_full_key includes edge_type, so these
coexist with used_after rows (no migration needed; edge_type is String(50)).
- Extracted _upsert_edge_row() shared by used_after and failed_after upserts.
- EdgeBuildSummary.failed_edges_built + build_edges() wiring.
GraphRouter._query_edges only follows edge_type=='used_after', so failed_after
edges are recorded for analysis/de-ranking but NEVER expanded into chains;
test_failed_after_edge_not_expanded guards this at the DB-filter layer.
Also removes a dead, harmful MagicMock of core.database.base from
test_prd139_edge_builder.py: it built Mock-based ToolRoutingEdge/Affinity
classes that corrupted sibling tests sharing the process (the real cause of
the only cross-file failures). The mock never helped — that file already
requires DB creds to collect, and real Base imports cleanly.
Tests: 86 passed across the routing/graph unit corpus
(test_prd139_edge_builder + test_graph_router{,_negative} + us014/us015 +
tool_routing_models). py_compile OK.
…D-141 US-019) Fold tool success/failure outcomes into the tool-routing graph in real time via an in-process asyncio.Queue drained by a single long-lived background task. record() is a non-blocking put_nowait on the hot path; DB access happens only inside the drain loop, ONE session per flushed batch. This replaces the original draft's per-call asyncio.ensure_future, which opened a new DB session on every tool call and exhausted the connection pool under load. - config: 4 opt-in settings (recorder off by default, batch size, flush interval, queue maxsize) - signal_recorder: aggregate batch -> used_after/failed_after edges + agent_prefers/fails_for_intent affinities. NULL-safe upsert (UPDATE ... IS NOT DISTINCT FROM -> INSERT-if-rowcount-0) because the unique constraints lack NULLS NOT DISTINCT and intent_cluster_id is always NULL. Recorder increments sample_count/weight on edges but never overwrites confidence; nightly edge_builder remains the authoritative Wilson recompute. - tool_router: non-blocking enqueue at both the success path and the except block; failures in the recorder can never break a tool call. 7 new tests (12 in file, 72 across the routing corpus) green.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR introduces an opt-in batched tool execution signal recorder that tracks routing behavior and updates edge/affinity telemetry incrementally. It adds a new ChangesTool Execution Signal Recorder & Routing Feedback
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary by CodeRabbit
New Features
Chores