Skip to content

Follow-up to #554: remaining items for #511 #598

@wyuc

Description

@wyuc

Context

#554 lands a focused director-layer fix for one of #511's three symptoms (premature END). Scope was narrowed during review and the regression eval was deferred. This tracks what's outstanding.

1. Agent-side attribution clarification (covers #511 symptoms ① and ②)

#554 only modifies the director path. The other two #511 symptoms originate in the agent path and are unchanged:

  • Off-topic replies — agent ignores the actual content of the prior turn.
  • Role confusion — teacher misattributes a human turn to a peer agent (the "照镜子" case).

In director-graph.ts:290, convertMessagesToOpenAI(state.messages, agentId) re-encodes peer agents as role:'user' with a [AgentName]: prefix. The human's turn is also role:'user' with [You]: (or its localised equivalent) prefix. The agent system prompt doesn't currently document this encoding, so the model has no structural signal to distinguish them.

Fix: add one or two sentences in lib/orchestration/prompt-builder.ts clarifying that role:'user' messages prefixed [Name]: are AI peers, not the human student. This was item 3 of @ashutoshrana's original plan in #511; it was dropped from #554 to keep scope tight.

2. End-to-end regression for premature-END

Deferred from #554. The unit tests there verify the director's inputs (summary labels), not its behaviour. The premature-END symptom is model-dependent — a deterministic e2e fixture that fails on main and passes with the fix is the real regression guard.

Frame as "verify #511 is actually fixed", not "add fixtures": reproduce the bug deterministically first (which models / prompts / seeds?), then encode the regression. If it doesn't reproduce reliably on any shipped model config, that's data we need before trusting the fix.

Lands under a new /eval/orchestration/ domain dir (not tests/eval/), following the outline-language pattern: runner.ts + judge.ts + scenarios/*.json + an eval:orchestration npm script.

3. Director Rule #11 wording

Rule #11 in director/system.md implicitly treats every [Student (Human)] turn as a question, so it over-blocks END when the human just acknowledges ("ok continue", "got it"). Suggested rewording: gate on "the most recent [Student (Human)] line asks a question or raises an objection AND appears after the last substantive [Agent] answer".

4. Test coverage for […]:-leading content

SENDER_PREFIX_RE = /^\[[^\]]+\]:\s*/ in conversation-summary.ts greedily strips the first […]: pattern. Content like "[FYI]: my answer is…" loses [FYI]: in the summary. Low-probability in practice; a single test pinning the chosen behaviour (strip vs preserve) would document the contract.

5. JSDoc contract on summarizeConversation

Add a line clarifying that system role messages don't appear in director-path input (convertMessagesToOpenAI filters non-user/assistant), so the 'System' label branch is defensive-only.


Items 1 and 2 are substantive. 3–5 are polish on #554's own work. Happy to coordinate via comments if anyone wants to bundle multiple.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions