Skip to content

Add structured freeform context payload + ExecutingTask/ready recovery#51

Open
m-nash wants to merge 20 commits into
mainfrom
feature/freeform-context-payload
Open

Add structured freeform context payload + ExecutingTask/ready recovery#51
m-nash wants to merge 20 commits into
mainfrom
feature/freeform-context-payload

Conversation

@m-nash

@m-nash m-nash commented Apr 30, 2026

Copy link
Copy Markdown
Owner

Two fixes for context-loss bugs in the pr-monitor flow that surfaced after switching freeform user input handling to MCP sampling.

1. Structured interpret_freeform context payload

Bug: MCP elicitation flows out of band from chat. When the server returned interpret_freeform to the agent with prose-only instructions, the agent had no chat record of the elicitation cycle and dismissed the user's reply as "stale state".

Fix: BuildFreeformInterpretAction now populates a typed FreeformInterpretContext payload containing:

  • The comment / CI failure being discussed
  • The server's analysis (recommendation, investigation findings, suggested fix)
  • The elicitation question + choices the user was shown
  • The user's reply text + sampling classification result

The agent can now ground its response in the same context the user was shown rather than treating it as out-of-band noise.

2. (ExecutingTask, "ready") recovery

Bug: The post-push pr_monitor_start hook was calling event=ready while the state machine was still in ExecutingTask mid-comment-flow. ProcessEvent had no transition for that pair, so it fell into RecoverFromUnexpectedState which destructively wiped CommentFlow / CiFailureFlow / ActiveWaitingComment and offered only Resume/Stop, restarting polling from scratch.

Fix A — auto-recover via HEAD-SHA snapshot:

  • New MonitorState.HeadShaAtTaskStart + EnterExecutingTask() helper that snapshots HEAD on every ExecutingTask entry
  • All 10 CurrentState = ExecutingTask sites converted to use the helper
  • New transition (ExecutingTask, "ready") -> auto_execute "recover_from_ready_in_executing_task"
  • The auto_execute handler refreshes HEAD via PrStatusFetcher; if HEAD advanced, re-dispatches as the documented completion event (comment_addressed / push_completed)

Fix B — non-destructive recovery from ExecutingTask:

  • When no push detected, RecoverFromUnexpectedState preserves flow state and offers flow-aware choices: "Treat as comment addressed" (routes through ProcessCommentAddressed which auto-composes a reply if needed) and "Treat as push completed"
  • Non-ExecutingTask prior states keep the existing destructive cleanup

Tests

22 new tests across FreeformInterpretContextTests and ExecutingTaskReadyRecoveryTests. 358 tests pass (was 336).

Validation

Deployed locally via Install-Debug.ps1 and exercised end-to-end on a real Azure SDK PR — both bugs no longer reproduce.

Two fixes for context-loss bugs in the pr-monitor flow that surfaced after
the switch to MCP sampling for freeform user input:

1. Structured interpret_freeform context payload
   MCP elicitation flows out of band from chat, so when the server returned
   interpret_freeform to the agent with prose-only instructions, the agent
   had no record of the elicitation cycle and dismissed the user's reply as
   "stale state". BuildFreeformInterpretAction now populates a typed
   FreeformInterpretContext (comment, server analysis, elicitation Q+choices,
   user reply + sampling classification) so the agent can ground its response
   in the same context the user was shown.

2. (ExecutingTask, "ready") recovery
   The post-push pr_monitor_start hook was calling event=ready while the
   state machine was still in ExecutingTask mid-comment-flow. ProcessEvent
   had no transition for that pair, so it fell into RecoverFromUnexpectedState
   which destructively wiped CommentFlow / CiFailureFlow / ActiveWaitingComment
   and offered only Resume/Stop, restarting polling from scratch.

   Fix A: snapshot HEAD on every ExecutingTask entry via new EnterExecutingTask()
   helper. New (ExecutingTask, ready) transition returns an auto_execute that
   refreshes HEAD via PrStatusFetcher; if HEAD advanced, re-dispatches as the
   documented completion event (comment_addressed / push_completed).

   Fix B: when no push detected, RecoverFromUnexpectedState preserves flow
   state and offers flow-aware choices (Treat as comment addressed / Treat
   as push completed) so the user can resume in-flow without losing their
   place. Non-ExecutingTask prior states keep the existing destructive cleanup.

22 new tests across FreeformInterpretContextTests and ExecutingTaskReadyRecoveryTests.
358 tests pass total.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two context-loss/recovery bugs in the PR monitor flow: (1) freeform MCP elicitation replies arriving out-of-band from chat, and (2) event=ready arriving while the state machine is still in ExecutingTask, causing destructive recovery and loss of in-flight flow state.

Changes:

  • Add a structured FreeformInterpretContext payload to interpret_freeform actions so the agent can ground out-of-band elicitation replies with the relevant PR-monitor context.
  • Add (ExecutingTask, "ready") recovery via an auto_execute HEAD-SHA check plus non-destructive, flow-aware recovery prompts when no push is detected.
  • Add tests covering the new structured context payload and ExecutingTask/ready recovery behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
PrCopilot/tests/PrCopilot.Tests/FreeformInterpretContextTests.cs New tests validating interpret_freeform carries structured context and boundary instructions.
PrCopilot/tests/PrCopilot.Tests/ExecutingTaskReadyRecoveryTests.cs New tests validating ExecutingTask snapshotting and ready-recovery behavior/choices.
PrCopilot/src/PrCopilot/Tools/UserReplyContext.cs New JSON-serializable model for authoritative elicitation replies.
PrCopilot/src/PrCopilot/Tools/ServerAnalysisContext.cs New model for recommendation/findings/suggested-fix context attached to freeform interpretation.
PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs Builds/attaches structured freeform context; uses EnterExecutingTask; implements auto-exec recovery for ExecutingTask/ready.
PrCopilot/src/PrCopilot/Tools/FreeformInterpretContext.cs New top-level structured payload for interpret_freeform actions.
PrCopilot/src/PrCopilot/Tools/FailedCheckContext.cs New model for failed-check entries inside CI failure context.
PrCopilot/src/PrCopilot/Tools/ElicitationContext.cs New model capturing elicitation question + choices.
PrCopilot/src/PrCopilot/Tools/ElicitationChoiceContext.cs New model capturing display/value mapping for elicitation choices.
PrCopilot/src/PrCopilot/Tools/CommentContext.cs New model for attaching active comment thread details to freeform interpretation context.
PrCopilot/src/PrCopilot/Tools/CiFailureContext.cs New model for attaching failed CI checks to freeform interpretation context.
PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs Adds (ExecutingTask, "ready") transition, flow-aware recovery prompting, and new choice mappings.
PrCopilot/src/PrCopilot/StateMachine/MonitorState.cs Adds HeadShaAtTaskStart and EnterExecutingTask() helper to snapshot HEAD on task entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs Outdated
m-nash and others added 3 commits May 5, 2026 13:43
…ailable

Addresses copilot-pull-request-reviewer[bot] feedback on PR #51:
FreeformInterpretContext was always setting Reason and SamplingClassification
to the "custom instruction" labels, even when sampling failed/returned null
(host capability missing, invalid JSON, exception). The payload didn't
actually carry the sampling outcome — making the field useless to the agent.

- TryClassifyFreeformViaSamplingAsync now returns FreeformClassification?
  (the full record) instead of string?. Null = sampling unavailable.
- BuildFreeformInterpretAction / BuildFreeformInterpretContext take an
  optional FreeformClassification? — sets Reason/SamplingClassification to
  sampling_unavailable/unavailable when null, sampling_classified_as_custom_instruction/
  custom_instruction when non-null.
- UserReplyContext gains an optional SamplingReasoning field that propagates
  the sampling reasoning into the payload so the agent has insight into WHY
  sampling decided the text was a custom instruction.
- All 3 call sites updated to extract .MapsToChoice for routing and pass
  the full classification on fallback.

3 tests updated/added; 360 tests pass total.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses copilot-pull-request-reviewer[bot] feedback on PR #51:
BuildFreeformInterpretContext only attached comment context when
state.CommentFlow != None. ProcessWaitingCommentChoice operates with
state.ActiveWaitingComment set but state.CommentFlow == None (the comment
flow ended when the reply was posted; the comment is now waiting for the
reviewer's response). A freeform reply during that elicitation lost the
comment context, reintroducing the original context-loss bug for that flow.

Adds an else-if branch: when CommentFlow is None and ActiveWaitingComment
is set, populate ctx.Comment + FlowType from ActiveWaitingComment. The
active comment-flow path still takes precedence (it's what the user is
currently being prompted about).

Tests: WaitingForReply_ActiveCommentSetButNoCommentFlow_StillAttachesComment
(was failing, now passes); CommentFlow_PrefersUnresolvedCommentsOverActiveWaiting
guards against ActiveWaitingComment hijacking the payload during a real flow.
362 tests pass total.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The auto_execute no-push fallback in recover_from_ready_in_executing_task
was dispatching ProcessEvent(state, 'ready_unresolved', ...) which fell
through to RecoverFromUnexpectedState and interpolated the literal string
'ready_unresolved' into the user-facing question:

  Unexpected event 'ready_unresolved' while addressing a comment. ...

Refactor BuildExecutingTaskRecoveryPrompt to take a user-friendly reasonText
parameter (was eventType), expose it as internal, and have the auto_execute
fallback call it directly with 'Looks like the task finished without a new
commit'. The genuine-unknown-event path in RecoverFromUnexpectedState still
surfaces the event name (useful debug info for real bugs).

Adds 4 tests covering the three flows and the preserved debug behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs:923

  • When sampling maps a waiting-comment freeform reply to a choice here, that normalized value is overwritten by the unconditional choice = triggerResult.Value a few lines later. ProcessWaitingCommentChoice then sees the raw freeform text instead of resolve/go_back, so inputs like "sure, resolve it" fall back to the default resume path instead of resolving the thread.
                            var classification = await TryClassifyFreeformViaSamplingAsync(server, triggerResult, cancellationToken);
                            var mappedChoice = classification?.MapsToChoice;
                            if (mappedChoice != null)
                            {
                                DebugLogger.Log("NextStep", $"Trigger sampling classified freeform as choice: {mappedChoice}");
                                @event = "user_chose";
                                choice = mappedChoice;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs Outdated
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs Outdated
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
m-nash and others added 4 commits May 5, 2026 15:11
The ExecutingTask no-push recovery prompt only offered 'Treat as comment
addressed' for comment-flow completion, which routes through
ProcessCommentAddressed and resolves the thread. Reviewer feedback on PR #51
pointed out that comment tasks can also legitimately finish as
comment_replied (pushback or clarification), and forcing those through the
addressed path incorrectly resolves a thread that should remain open for
the reviewer.

Adds 'Treat as comment replied' as a fifth choice. It maps to
treat_as_replied_externally and routes to SkipAndAdvanceComment, which
advances past the current comment without posting anything new and without
resolving the thread. Intended for the case where the user already handled
the reply outside the loop (e.g., replied directly on GitHub).

Adds 4 tests covering the new choice, the ChoiceValueMap entry, the
advance-without-posting behavior, and the transition-to-polling on the
last comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The (ExecutingTask, 'ready') auto-recovery dispatched comment_addressed
whenever HEAD advanced and a comment flow was active. But the
apply_recommendation task documents two completion paths:

  - Implement fix     -> event=comment_addressed (resolve thread)
  - Pushback w/ test  -> event=comment_replied  (keep thread open)

For the proving-test pushback path, dispatching comment_addressed
incorrectly resolved a human review thread that should have stayed
open for the reviewer to respond to.

Fix:
  - Add MonitorState.ExecutingTaskExpectedCompletion (nullable). Reset
    to null in EnterExecutingTask.
  - EmitAddressCommentAction sets it to 'comment_addressed' (single
    documented completion path).
  - BeginApplyRecommendation leaves it null (ambiguous - both paths
    possible).
  - Extract recovery decision logic into testable helper
    MonitorTransitions.BuildRecoverFromReadyResolution(state, headAdvanced).
    ExecuteAutoAction does the gh HEAD fetch then delegates.
  - Decision: when HEAD advanced + comment flow + ExpectedCompletion ==
    'comment_addressed' -> auto-dispatch (preserves convenience for the
    unambiguous case). Otherwise -> fall back to flow-aware ask_user
    prompt so the user disambiguates between addressed and replied.

Adds 5 tests covering the ambiguous-task ask_user fallback, the
unambiguous-task auto-dispatch, the no-push regression, and the
ExpectedCompletion assignment in both emitters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BuildWaitingCommentAction puts the monitor in a state where
ActiveWaitingComment is set but both CommentFlow and CiFailureFlow are
None (the comment flow ended when the reply was posted; the thread is
now waiting for the reviewer's response, dispatched via
ProcessWaitingCommentChoice on ActiveWaitingComment != null).

If an interpret_freeform / execute task ran in that state and a stray
ready or unknown event arrived, BuildExecutingTaskRecoveryPrompt fell
into the no-flow generic branch which cleared ActiveWaitingComment and
offered only Resume/Stop. The user lost the waiting-thread context and
could no longer recover with the original Resolve / Go-back choices.

Fix: add a waiting-comment branch (between CI flow and the truly-no-
context generic branch) that preserves ActiveWaitingComment and offers
[Resolve this thread, Go back to monitoring, Stop monitoring]. The
existing ProcessUserChoice -> ProcessWaitingCommentChoice dispatch then
routes the user's selection correctly.

BuildRecoverFromReadyResolution gets a parallel branch: when HEAD
advanced + no flow + waiting comment, surface the same prompt instead
of auto-resuming (which would also clear the waiting context).

Adds 4 tests covering the no-push and push-detected paths, the end-to-
end Resolve dispatch, and the truly-no-context regression.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ExecutingTask recovery prompt offers 'Skip this comment' for every
comment-flow sub-state, but ProcessCommentChoice only handled 'skip'
for AddressAllIterating and ExplainAllIterating. The remaining sub-
states (SingleCommentPrompt, PickComment, PickRemaining) fell through
to the default '_ => TransitionToPolling' branch, which abandoned the
entire comment flow instead of skipping just the active thread.

Fix: add explicit (X, 'skip') => SkipAndAdvanceComment cases for the
three previously-broken sub-states. AddressAllIterating keeps its
existing SkipAndAdvanceComment route. ExplainAllIterating intentionally
keeps AdvanceExplainAll (auto-emits the next explain_comment task) —
different semantics for the explain loop, preserved.

Adds 5 tests: 3 for the broken sub-states (skip now advances index and
stays in flow), 2 regression tests for AddressAll and ExplainAll
behaviors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs Outdated
Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
m-nash and others added 5 commits May 5, 2026 17:02
…eview)

SampleStructuredAsync throws InvalidOperationException when the MCP client doesn't support sampling. ClassifyFreeformAsync's catch filter excluded that exception, so it bubbled out instead of returning null. Callers ended up on the generic internal-error path instead of the documented sampling_unavailable freeform fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When ActiveWaitingComment was set but CommentFlow was None, freeform replies that didn't map to 'Resolve' or 'Go back' fell into the generic task_complete branch. ProcessTaskComplete then cleared ActiveWaitingComment and transitioned to polling, abandoning the thread.

Add a waiting-comment branch to BuildFreeformPathBInstructions that mirrors the comment-flow branch (uses comment_addressed for code changes, comment_replied for reply-only). Update ProcessCommentAddressed and ProcessCommentReplied to dispatch via ActiveWaitingComment when no comment flow is active.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CI apply_fix task runs in ApplyingFix and tells the agent to push before reporting push_completed. A post-push hook firing event=ready from ApplyingFix fell into RecoverFromUnexpectedState, which wiped CiFailureFlow and lost the CI flow context.

- Add EnterApplyingFix that snapshots HEAD (shared SnapshotForRecovery helper with EnterExecutingTask). - Have BeginApplyFix go through EnterApplyingFix. - Add (ApplyingFix, 'ready') => BuildRecoverFromReadyAction so the existing headAdvanced + CiFailureFlow != None branch dispatches push_completed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EmitComposeReplyAction was calling state.EnterExecutingTask(), which overwrites
HeadShaAtTaskStart (with the still-stale state.HeadSha, since the server hasn't
polled since the agent pushed) and resets ExecutingTaskExpectedCompletion to null.
compose_reply never touches git, so the prior task's recovery snapshot must be
preserved -- otherwise an event=ready arriving during compose_reply misattributes
the original push to compose_reply and surfaces a bogus recovery prompt or
auto-dispatches the wrong completion event.

Fix: set CurrentState = ExecutingTask directly. Both callers
(ProcessCommentAddressed, ProcessCommentReplied) already run with
CurrentState == ExecutingTask, so no transition is needed.

Added 3 regression tests in ComposeReplySnapshotTests.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#51 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs Outdated
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
m-nash and others added 3 commits May 5, 2026 22:41
BuildExecutingTaskRecoveryPrompt offers "Stop monitoring" (-> value "stop")
in every flow-preserving branch. But ProcessUserChoice routed to
ProcessCommentChoice / ProcessCiFailureChoice / ProcessWaitingCommentChoice
first when those flows were set, and none of them handled "stop" -- they all
fell through to TransitionToPolling / ClearWaitingAndResume. Result: picking
"Stop monitoring" from a recovery prompt silently resumed polling instead of
stopping the monitor.

Fix: hoist "stop" handling above the per-flow routing in ProcessUserChoice
so it always returns StopMonitoring regardless of active flow.

Added 4 regression tests in StopChoiceRoutingTests.cs covering all three
flow contexts plus the no-flow regression case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
treat_as_replied_externally was unconditionally routed to SkipAndAdvanceComment,
which only knows the AddressAllIterating-style prompt. For ExplainAllIterating,
the user dropped out of the explain-all loop and got an Address prompt instead
of the next comment's explain task. For SingleCommentPrompt, the expected
PickRemaining flow was bypassed.

Fix: route through AdvanceAfterComment which is fully flow-aware (same path the
normal comment_replied event uses).

Added 4 regression tests in TreatAsRepliedExternallyRoutingTests.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… review)

The waiting-thread branches in ProcessCommentAddressed/ProcessCommentReplied set
PendingResolveAfterAddress / PendingAdvanceAfterReply without entering a comment
flow (CommentFlow stays None). When ProcessTaskComplete then runs, it indexes
UnresolvedComments[CurrentCommentIndex] -- pointing to a stale entry from a
prior flow -- and triggers a bogus rerequest_review on an unrelated reviewer or
surfaces PickRemaining for unrelated comments.

Fix: add a guard at the top of both PendingResolveAfterAddress and
PendingAdvanceAfterReply branches -- when CommentFlow == None, this was a
waiting-thread task; just TransitionToPolling (which clears all pending state).

Added 3 regression tests in WaitingThreadResolveTests.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs
…#51 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Comment thread PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs
Comment thread PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs Outdated
m-nash and others added 2 commits May 5, 2026 23:49
#51 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iew)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants