Context
#554 lands a focused director-layer fix for one of #511's three symptoms (premature END). Scope was narrowed during review and the regression eval was deferred. This tracks what's outstanding.
1. Agent-side attribution clarification (covers #511 symptoms ① and ②)
#554 only modifies the director path. The other two #511 symptoms originate in the agent path and are unchanged:
- ① Off-topic replies — agent ignores the actual content of the prior turn.
- ② Role confusion — teacher misattributes a human turn to a peer agent (the "照镜子" case).
In director-graph.ts:290, convertMessagesToOpenAI(state.messages, agentId) re-encodes peer agents as role:'user' with a [AgentName]: prefix. The human's turn is also role:'user' with [You]: (or its localised equivalent) prefix. The agent system prompt doesn't currently document this encoding, so the model has no structural signal to distinguish them.
Fix: add one or two sentences in lib/orchestration/prompt-builder.ts clarifying that role:'user' messages prefixed [Name]: are AI peers, not the human student. This was item 3 of @ashutoshrana's original plan in #511; it was dropped from #554 to keep scope tight.
2. End-to-end regression for premature-END
Deferred from #554. The unit tests there verify the director's inputs (summary labels), not its behaviour. The premature-END symptom is model-dependent — a deterministic e2e fixture that fails on main and passes with the fix is the real regression guard.
Frame as "verify #511 is actually fixed", not "add fixtures": reproduce the bug deterministically first (which models / prompts / seeds?), then encode the regression. If it doesn't reproduce reliably on any shipped model config, that's data we need before trusting the fix.
Lands under a new /eval/orchestration/ domain dir (not tests/eval/), following the outline-language pattern: runner.ts + judge.ts + scenarios/*.json + an eval:orchestration npm script.
3. Director Rule #11 wording
Rule #11 in director/system.md implicitly treats every [Student (Human)] turn as a question, so it over-blocks END when the human just acknowledges ("ok continue", "got it"). Suggested rewording: gate on "the most recent [Student (Human)] line asks a question or raises an objection AND appears after the last substantive [Agent] answer".
4. Test coverage for […]:-leading content
SENDER_PREFIX_RE = /^\[[^\]]+\]:\s*/ in conversation-summary.ts greedily strips the first […]: pattern. Content like "[FYI]: my answer is…" loses [FYI]: in the summary. Low-probability in practice; a single test pinning the chosen behaviour (strip vs preserve) would document the contract.
5. JSDoc contract on summarizeConversation
Add a line clarifying that system role messages don't appear in director-path input (convertMessagesToOpenAI filters non-user/assistant), so the 'System' label branch is defensive-only.
Items 1 and 2 are substantive. 3–5 are polish on #554's own work. Happy to coordinate via comments if anyone wants to bundle multiple.
Context
#554 lands a focused director-layer fix for one of #511's three symptoms (premature
END). Scope was narrowed during review and the regression eval was deferred. This tracks what's outstanding.1. Agent-side attribution clarification (covers #511 symptoms ① and ②)
#554 only modifies the director path. The other two #511 symptoms originate in the agent path and are unchanged:
In
director-graph.ts:290,convertMessagesToOpenAI(state.messages, agentId)re-encodes peer agents asrole:'user'with a[AgentName]:prefix. The human's turn is alsorole:'user'with[You]:(or its localised equivalent) prefix. The agent system prompt doesn't currently document this encoding, so the model has no structural signal to distinguish them.Fix: add one or two sentences in
lib/orchestration/prompt-builder.tsclarifying thatrole:'user'messages prefixed[Name]:are AI peers, not the human student. This was item 3 of @ashutoshrana's original plan in #511; it was dropped from #554 to keep scope tight.2. End-to-end regression for premature-END
Deferred from #554. The unit tests there verify the director's inputs (summary labels), not its behaviour. The premature-END symptom is model-dependent — a deterministic e2e fixture that fails on
mainand passes with the fix is the real regression guard.Frame as "verify #511 is actually fixed", not "add fixtures": reproduce the bug deterministically first (which models / prompts / seeds?), then encode the regression. If it doesn't reproduce reliably on any shipped model config, that's data we need before trusting the fix.
Lands under a new
/eval/orchestration/domain dir (nottests/eval/), following theoutline-languagepattern:runner.ts+judge.ts+scenarios/*.json+ aneval:orchestrationnpm script.3. Director Rule #11 wording
Rule #11 in
director/system.mdimplicitly treats every[Student (Human)]turn as a question, so it over-blocksENDwhen the human just acknowledges ("ok continue", "got it"). Suggested rewording: gate on "the most recent[Student (Human)]line asks a question or raises an objection AND appears after the last substantive[Agent]answer".4. Test coverage for
[…]:-leading contentSENDER_PREFIX_RE = /^\[[^\]]+\]:\s*/inconversation-summary.tsgreedily strips the first[…]:pattern. Content like"[FYI]: my answer is…"loses[FYI]:in the summary. Low-probability in practice; a single test pinning the chosen behaviour (strip vs preserve) would document the contract.5. JSDoc contract on
summarizeConversationAdd a line clarifying that
systemrole messages don't appear in director-path input (convertMessagesToOpenAIfilters non-user/assistant), so the'System'label branch is defensive-only.Items 1 and 2 are substantive. 3–5 are polish on #554's own work. Happy to coordinate via comments if anyone wants to bundle multiple.