Follow-up to #554: remaining items for #511

## Context

#554 lands a focused director-layer fix for one of #511's three symptoms (premature `END`). Scope was narrowed during review and the regression eval was deferred. This tracks what's outstanding.

## 1. Agent-side attribution clarification (covers #511 symptoms ① and ②)

#554 only modifies the director path. The other two #511 symptoms originate in the **agent** path and are unchanged:

- ① **Off-topic replies** — agent ignores the actual content of the prior turn.
- ② **Role confusion** — teacher misattributes a human turn to a peer agent (the "照镜子" case).

In `director-graph.ts:290`, `convertMessagesToOpenAI(state.messages, agentId)` re-encodes peer agents as `role:'user'` with a `[AgentName]:` prefix. The human's turn is also `role:'user'` with `[You]:` (or its localised equivalent) prefix. The agent system prompt doesn't currently document this encoding, so the model has no structural signal to distinguish them.

**Fix:** add one or two sentences in `lib/orchestration/prompt-builder.ts` clarifying that `role:'user'` messages prefixed `[Name]:` are AI peers, not the human student. This was item 3 of @ashutoshrana's original plan in #511; it was dropped from #554 to keep scope tight.

## 2. End-to-end regression for premature-END

Deferred from #554. The unit tests there verify the director's **inputs** (summary labels), not its **behaviour**. The premature-END symptom is model-dependent — a deterministic e2e fixture that fails on `main` and passes with the fix is the real regression guard.

Frame as **"verify #511 is actually fixed"**, not "add fixtures": reproduce the bug deterministically first (which models / prompts / seeds?), then encode the regression. If it doesn't reproduce reliably on any shipped model config, that's data we need before trusting the fix.

Lands under a new `/eval/orchestration/` domain dir (not `tests/eval/`), following the `outline-language` pattern: `runner.ts` + `judge.ts` + `scenarios/*.json` + an `eval:orchestration` npm script.

## 3. Director Rule #11 wording

Rule #11 in `director/system.md` implicitly treats every `[Student (Human)]` turn as a question, so it over-blocks `END` when the human just acknowledges ("ok continue", "got it"). Suggested rewording: gate on **"the most recent `[Student (Human)]` line asks a question or raises an objection AND appears after the last substantive `[Agent]` answer"**.

## 4. Test coverage for `[…]:`-leading content

`SENDER_PREFIX_RE = /^\[[^\]]+\]:\s*/` in `conversation-summary.ts` greedily strips the first `[…]:` pattern. Content like `"[FYI]: my answer is…"` loses `[FYI]: ` in the summary. Low-probability in practice; a single test pinning the chosen behaviour (strip vs preserve) would document the contract.

## 5. JSDoc contract on `summarizeConversation`

Add a line clarifying that `system` role messages don't appear in director-path input (`convertMessagesToOpenAI` filters non-user/assistant), so the `'System'` label branch is defensive-only.

---

Items 1 and 2 are substantive. 3–5 are polish on #554's own work. Happy to coordinate via comments if anyone wants to bundle multiple.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up to #554: remaining items for #511 #598

Context

1. Agent-side attribution clarification (covers #511 symptoms ① and ②)

2. End-to-end regression for premature-END

3. Director Rule #11 wording

4. Test coverage for `[…]:`-leading content

5. JSDoc contract on `summarizeConversation`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Follow-up to #554: remaining items for #511 #598

Description

Context

1. Agent-side attribution clarification (covers #511 symptoms ① and ②)

2. End-to-end regression for premature-END

3. Director Rule #11 wording

4. Test coverage for […]:-leading content

5. JSDoc contract on summarizeConversation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

4. Test coverage for `[…]:`-leading content

5. JSDoc contract on `summarizeConversation`