Skip to content

multi-repo: no operator-facing observability when trigger_pr_creation partially fails #258

Description

@eipasteur

Problem

Since PR #256, trigger_pr_creation returns { partialFailure: true, failedRepos } instead of silently dropping repos, and the orchestrator prompt handles remediation/escalation. But post-hoc, an operator has nothing to diagnose with:

  • the failure detail exists only in the tool response (consumed by the LLM) and console.error in the ephemeral ECS task's CloudWatch stream
  • nothing is persisted to Neptune about WHICH repo failed and WHY
  • no metric/alarm fires — if the agent mishandles the response, the failure evaporates

Proposal

  1. EMF metric MultiRepoPRPartialFailure (dimensions: projectId, repository, reason: conflict|mergeError|other) emitted from the MCP server
  2. Persist failed_repos (JSON) on the PRGroup vertex next to the expected_repos property PR feat(multi-repo): complete project repository management UI #256 added, so the UI can render "PR missing for repo X: merge conflict on branch Y"
  3. Surface incomplete PRGroups in the frontend (compare expected_repos vs GROUPS edges — the reconciliation data model already supports this)

Refs: PR #256 review finding M6 (#256 (comment))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions