Add structured freeform context payload + ExecutingTask/ready recovery#51
Add structured freeform context payload + ExecutingTask/ready recovery#51m-nash wants to merge 20 commits into
Conversation
Two fixes for context-loss bugs in the pr-monitor flow that surfaced after the switch to MCP sampling for freeform user input: 1. Structured interpret_freeform context payload MCP elicitation flows out of band from chat, so when the server returned interpret_freeform to the agent with prose-only instructions, the agent had no record of the elicitation cycle and dismissed the user's reply as "stale state". BuildFreeformInterpretAction now populates a typed FreeformInterpretContext (comment, server analysis, elicitation Q+choices, user reply + sampling classification) so the agent can ground its response in the same context the user was shown. 2. (ExecutingTask, "ready") recovery The post-push pr_monitor_start hook was calling event=ready while the state machine was still in ExecutingTask mid-comment-flow. ProcessEvent had no transition for that pair, so it fell into RecoverFromUnexpectedState which destructively wiped CommentFlow / CiFailureFlow / ActiveWaitingComment and offered only Resume/Stop, restarting polling from scratch. Fix A: snapshot HEAD on every ExecutingTask entry via new EnterExecutingTask() helper. New (ExecutingTask, ready) transition returns an auto_execute that refreshes HEAD via PrStatusFetcher; if HEAD advanced, re-dispatches as the documented completion event (comment_addressed / push_completed). Fix B: when no push detected, RecoverFromUnexpectedState preserves flow state and offers flow-aware choices (Treat as comment addressed / Treat as push completed) so the user can resume in-flow without losing their place. Non-ExecutingTask prior states keep the existing destructive cleanup. 22 new tests across FreeformInterpretContextTests and ExecutingTaskReadyRecoveryTests. 358 tests pass total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses two context-loss/recovery bugs in the PR monitor flow: (1) freeform MCP elicitation replies arriving out-of-band from chat, and (2) event=ready arriving while the state machine is still in ExecutingTask, causing destructive recovery and loss of in-flight flow state.
Changes:
- Add a structured
FreeformInterpretContextpayload tointerpret_freeformactions so the agent can ground out-of-band elicitation replies with the relevant PR-monitor context. - Add
(ExecutingTask, "ready")recovery via anauto_executeHEAD-SHA check plus non-destructive, flow-aware recovery prompts when no push is detected. - Add tests covering the new structured context payload and ExecutingTask/ready recovery behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| PrCopilot/tests/PrCopilot.Tests/FreeformInterpretContextTests.cs | New tests validating interpret_freeform carries structured context and boundary instructions. |
| PrCopilot/tests/PrCopilot.Tests/ExecutingTaskReadyRecoveryTests.cs | New tests validating ExecutingTask snapshotting and ready-recovery behavior/choices. |
| PrCopilot/src/PrCopilot/Tools/UserReplyContext.cs | New JSON-serializable model for authoritative elicitation replies. |
| PrCopilot/src/PrCopilot/Tools/ServerAnalysisContext.cs | New model for recommendation/findings/suggested-fix context attached to freeform interpretation. |
| PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs | Builds/attaches structured freeform context; uses EnterExecutingTask; implements auto-exec recovery for ExecutingTask/ready. |
| PrCopilot/src/PrCopilot/Tools/FreeformInterpretContext.cs | New top-level structured payload for interpret_freeform actions. |
| PrCopilot/src/PrCopilot/Tools/FailedCheckContext.cs | New model for failed-check entries inside CI failure context. |
| PrCopilot/src/PrCopilot/Tools/ElicitationContext.cs | New model capturing elicitation question + choices. |
| PrCopilot/src/PrCopilot/Tools/ElicitationChoiceContext.cs | New model capturing display/value mapping for elicitation choices. |
| PrCopilot/src/PrCopilot/Tools/CommentContext.cs | New model for attaching active comment thread details to freeform interpretation context. |
| PrCopilot/src/PrCopilot/Tools/CiFailureContext.cs | New model for attaching failed CI checks to freeform interpretation context. |
| PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs | Adds (ExecutingTask, "ready") transition, flow-aware recovery prompting, and new choice mappings. |
| PrCopilot/src/PrCopilot/StateMachine/MonitorState.cs | Adds HeadShaAtTaskStart and EnterExecutingTask() helper to snapshot HEAD on task entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ailable Addresses copilot-pull-request-reviewer[bot] feedback on PR #51: FreeformInterpretContext was always setting Reason and SamplingClassification to the "custom instruction" labels, even when sampling failed/returned null (host capability missing, invalid JSON, exception). The payload didn't actually carry the sampling outcome — making the field useless to the agent. - TryClassifyFreeformViaSamplingAsync now returns FreeformClassification? (the full record) instead of string?. Null = sampling unavailable. - BuildFreeformInterpretAction / BuildFreeformInterpretContext take an optional FreeformClassification? — sets Reason/SamplingClassification to sampling_unavailable/unavailable when null, sampling_classified_as_custom_instruction/ custom_instruction when non-null. - UserReplyContext gains an optional SamplingReasoning field that propagates the sampling reasoning into the payload so the agent has insight into WHY sampling decided the text was a custom instruction. - All 3 call sites updated to extract .MapsToChoice for routing and pass the full classification on fallback. 3 tests updated/added; 360 tests pass total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses copilot-pull-request-reviewer[bot] feedback on PR #51: BuildFreeformInterpretContext only attached comment context when state.CommentFlow != None. ProcessWaitingCommentChoice operates with state.ActiveWaitingComment set but state.CommentFlow == None (the comment flow ended when the reply was posted; the comment is now waiting for the reviewer's response). A freeform reply during that elicitation lost the comment context, reintroducing the original context-loss bug for that flow. Adds an else-if branch: when CommentFlow is None and ActiveWaitingComment is set, populate ctx.Comment + FlowType from ActiveWaitingComment. The active comment-flow path still takes precedence (it's what the user is currently being prompted about). Tests: WaitingForReply_ActiveCommentSetButNoCommentFlow_StillAttachesComment (was failing, now passes); CommentFlow_PrefersUnresolvedCommentsOverActiveWaiting guards against ActiveWaitingComment hijacking the payload during a real flow. 362 tests pass total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The auto_execute no-push fallback in recover_from_ready_in_executing_task was dispatching ProcessEvent(state, 'ready_unresolved', ...) which fell through to RecoverFromUnexpectedState and interpolated the literal string 'ready_unresolved' into the user-facing question: Unexpected event 'ready_unresolved' while addressing a comment. ... Refactor BuildExecutingTaskRecoveryPrompt to take a user-friendly reasonText parameter (was eventType), expose it as internal, and have the auto_execute fallback call it directly with 'Looks like the task finished without a new commit'. The genuine-unknown-event path in RecoverFromUnexpectedState still surfaces the event name (useful debug info for real bugs). Adds 4 tests covering the three flows and the preserved debug behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs:923
- When sampling maps a waiting-comment freeform reply to a choice here, that normalized value is overwritten by the unconditional
choice = triggerResult.Valuea few lines later.ProcessWaitingCommentChoicethen sees the raw freeform text instead ofresolve/go_back, so inputs like "sure, resolve it" fall back to the default resume path instead of resolving the thread.
var classification = await TryClassifyFreeformViaSamplingAsync(server, triggerResult, cancellationToken);
var mappedChoice = classification?.MapsToChoice;
if (mappedChoice != null)
{
DebugLogger.Log("NextStep", $"Trigger sampling classified freeform as choice: {mappedChoice}");
@event = "user_chose";
choice = mappedChoice;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The ExecutingTask no-push recovery prompt only offered 'Treat as comment addressed' for comment-flow completion, which routes through ProcessCommentAddressed and resolves the thread. Reviewer feedback on PR #51 pointed out that comment tasks can also legitimately finish as comment_replied (pushback or clarification), and forcing those through the addressed path incorrectly resolves a thread that should remain open for the reviewer. Adds 'Treat as comment replied' as a fifth choice. It maps to treat_as_replied_externally and routes to SkipAndAdvanceComment, which advances past the current comment without posting anything new and without resolving the thread. Intended for the case where the user already handled the reply outside the loop (e.g., replied directly on GitHub). Adds 4 tests covering the new choice, the ChoiceValueMap entry, the advance-without-posting behavior, and the transition-to-polling on the last comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The (ExecutingTask, 'ready') auto-recovery dispatched comment_addressed
whenever HEAD advanced and a comment flow was active. But the
apply_recommendation task documents two completion paths:
- Implement fix -> event=comment_addressed (resolve thread)
- Pushback w/ test -> event=comment_replied (keep thread open)
For the proving-test pushback path, dispatching comment_addressed
incorrectly resolved a human review thread that should have stayed
open for the reviewer to respond to.
Fix:
- Add MonitorState.ExecutingTaskExpectedCompletion (nullable). Reset
to null in EnterExecutingTask.
- EmitAddressCommentAction sets it to 'comment_addressed' (single
documented completion path).
- BeginApplyRecommendation leaves it null (ambiguous - both paths
possible).
- Extract recovery decision logic into testable helper
MonitorTransitions.BuildRecoverFromReadyResolution(state, headAdvanced).
ExecuteAutoAction does the gh HEAD fetch then delegates.
- Decision: when HEAD advanced + comment flow + ExpectedCompletion ==
'comment_addressed' -> auto-dispatch (preserves convenience for the
unambiguous case). Otherwise -> fall back to flow-aware ask_user
prompt so the user disambiguates between addressed and replied.
Adds 5 tests covering the ambiguous-task ask_user fallback, the
unambiguous-task auto-dispatch, the no-push regression, and the
ExpectedCompletion assignment in both emitters.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BuildWaitingCommentAction puts the monitor in a state where ActiveWaitingComment is set but both CommentFlow and CiFailureFlow are None (the comment flow ended when the reply was posted; the thread is now waiting for the reviewer's response, dispatched via ProcessWaitingCommentChoice on ActiveWaitingComment != null). If an interpret_freeform / execute task ran in that state and a stray ready or unknown event arrived, BuildExecutingTaskRecoveryPrompt fell into the no-flow generic branch which cleared ActiveWaitingComment and offered only Resume/Stop. The user lost the waiting-thread context and could no longer recover with the original Resolve / Go-back choices. Fix: add a waiting-comment branch (between CI flow and the truly-no- context generic branch) that preserves ActiveWaitingComment and offers [Resolve this thread, Go back to monitoring, Stop monitoring]. The existing ProcessUserChoice -> ProcessWaitingCommentChoice dispatch then routes the user's selection correctly. BuildRecoverFromReadyResolution gets a parallel branch: when HEAD advanced + no flow + waiting comment, surface the same prompt instead of auto-resuming (which would also clear the waiting context). Adds 4 tests covering the no-push and push-detected paths, the end-to- end Resolve dispatch, and the truly-no-context regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ExecutingTask recovery prompt offers 'Skip this comment' for every comment-flow sub-state, but ProcessCommentChoice only handled 'skip' for AddressAllIterating and ExplainAllIterating. The remaining sub- states (SingleCommentPrompt, PickComment, PickRemaining) fell through to the default '_ => TransitionToPolling' branch, which abandoned the entire comment flow instead of skipping just the active thread. Fix: add explicit (X, 'skip') => SkipAndAdvanceComment cases for the three previously-broken sub-states. AddressAllIterating keeps its existing SkipAndAdvanceComment route. ExplainAllIterating intentionally keeps AdvanceExplainAll (auto-emits the next explain_comment task) — different semantics for the explain loop, preserved. Adds 5 tests: 3 for the broken sub-states (skip now advances index and stays in flow), 2 regression tests for AddressAll and ExplainAll behaviors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…eview) SampleStructuredAsync throws InvalidOperationException when the MCP client doesn't support sampling. ClassifyFreeformAsync's catch filter excluded that exception, so it bubbled out instead of returning null. Callers ended up on the generic internal-error path instead of the documented sampling_unavailable freeform fallback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When ActiveWaitingComment was set but CommentFlow was None, freeform replies that didn't map to 'Resolve' or 'Go back' fell into the generic task_complete branch. ProcessTaskComplete then cleared ActiveWaitingComment and transitioned to polling, abandoning the thread. Add a waiting-comment branch to BuildFreeformPathBInstructions that mirrors the comment-flow branch (uses comment_addressed for code changes, comment_replied for reply-only). Update ProcessCommentAddressed and ProcessCommentReplied to dispatch via ActiveWaitingComment when no comment flow is active. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CI apply_fix task runs in ApplyingFix and tells the agent to push before reporting push_completed. A post-push hook firing event=ready from ApplyingFix fell into RecoverFromUnexpectedState, which wiped CiFailureFlow and lost the CI flow context. - Add EnterApplyingFix that snapshots HEAD (shared SnapshotForRecovery helper with EnterExecutingTask). - Have BeginApplyFix go through EnterApplyingFix. - Add (ApplyingFix, 'ready') => BuildRecoverFromReadyAction so the existing headAdvanced + CiFailureFlow != None branch dispatches push_completed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
EmitComposeReplyAction was calling state.EnterExecutingTask(), which overwrites HeadShaAtTaskStart (with the still-stale state.HeadSha, since the server hasn't polled since the agent pushed) and resets ExecutingTaskExpectedCompletion to null. compose_reply never touches git, so the prior task's recovery snapshot must be preserved -- otherwise an event=ready arriving during compose_reply misattributes the original push to compose_reply and surfaces a bogus recovery prompt or auto-dispatches the wrong completion event. Fix: set CurrentState = ExecutingTask directly. Both callers (ProcessCommentAddressed, ProcessCommentReplied) already run with CurrentState == ExecutingTask, so no transition is needed. Added 3 regression tests in ComposeReplySnapshotTests.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#51 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
BuildExecutingTaskRecoveryPrompt offers "Stop monitoring" (-> value "stop") in every flow-preserving branch. But ProcessUserChoice routed to ProcessCommentChoice / ProcessCiFailureChoice / ProcessWaitingCommentChoice first when those flows were set, and none of them handled "stop" -- they all fell through to TransitionToPolling / ClearWaitingAndResume. Result: picking "Stop monitoring" from a recovery prompt silently resumed polling instead of stopping the monitor. Fix: hoist "stop" handling above the per-flow routing in ProcessUserChoice so it always returns StopMonitoring regardless of active flow. Added 4 regression tests in StopChoiceRoutingTests.cs covering all three flow contexts plus the no-flow regression case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
treat_as_replied_externally was unconditionally routed to SkipAndAdvanceComment, which only knows the AddressAllIterating-style prompt. For ExplainAllIterating, the user dropped out of the explain-all loop and got an Address prompt instead of the next comment's explain task. For SingleCommentPrompt, the expected PickRemaining flow was bypassed. Fix: route through AdvanceAfterComment which is fully flow-aware (same path the normal comment_replied event uses). Added 4 regression tests in TreatAsRepliedExternallyRoutingTests.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… review) The waiting-thread branches in ProcessCommentAddressed/ProcessCommentReplied set PendingResolveAfterAddress / PendingAdvanceAfterReply without entering a comment flow (CommentFlow stays None). When ProcessTaskComplete then runs, it indexes UnresolvedComments[CurrentCommentIndex] -- pointing to a stale entry from a prior flow -- and triggers a bogus rerequest_review on an unrelated reviewer or surfaces PickRemaining for unrelated comments. Fix: add a guard at the top of both PendingResolveAfterAddress and PendingAdvanceAfterReply branches -- when CommentFlow == None, this was a waiting-thread task; just TransitionToPolling (which clears all pending state). Added 3 regression tests in WaitingThreadResolveTests.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#51 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#51 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iew) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fixes for context-loss bugs in the pr-monitor flow that surfaced after switching freeform user input handling to MCP sampling.
1. Structured interpret_freeform context payload
Bug: MCP elicitation flows out of band from chat. When the server returned interpret_freeform to the agent with prose-only instructions, the agent had no chat record of the elicitation cycle and dismissed the user's reply as
"stale state".Fix:
BuildFreeformInterpretActionnow populates a typedFreeformInterpretContextpayload containing:The agent can now ground its response in the same context the user was shown rather than treating it as out-of-band noise.
2.
(ExecutingTask, "ready")recoveryBug: The post-push
pr_monitor_starthook was callingevent=readywhile the state machine was still inExecutingTaskmid-comment-flow.ProcessEventhad no transition for that pair, so it fell intoRecoverFromUnexpectedStatewhich destructively wipedCommentFlow/CiFailureFlow/ActiveWaitingCommentand offered only Resume/Stop, restarting polling from scratch.Fix A — auto-recover via HEAD-SHA snapshot:
MonitorState.HeadShaAtTaskStart+EnterExecutingTask()helper that snapshots HEAD on everyExecutingTaskentryCurrentState = ExecutingTasksites converted to use the helper(ExecutingTask, "ready") -> auto_execute "recover_from_ready_in_executing_task"PrStatusFetcher; if HEAD advanced, re-dispatches as the documented completion event (comment_addressed/push_completed)Fix B — non-destructive recovery from ExecutingTask:
RecoverFromUnexpectedStatepreserves flow state and offers flow-aware choices:"Treat as comment addressed"(routes throughProcessCommentAddressedwhich auto-composes a reply if needed) and"Treat as push completed"ExecutingTaskprior states keep the existing destructive cleanupTests
22 new tests across
FreeformInterpretContextTestsandExecutingTaskReadyRecoveryTests. 358 tests pass (was 336).Validation
Deployed locally via
Install-Debug.ps1and exercised end-to-end on a real Azure SDK PR — both bugs no longer reproduce.