Context
During the upstream sync merge (ghostwright/phantom -> electronicBlacksmith/phantom), code reviews from Claude Sonnet 4.6 and GitHub Copilot GPT-5.4 identified several issues that don't block the merge but should be addressed in follow-up work.
Issues Identified
1. maxTokens silently ignored in judge subprocess (Medium)
File: src/agent/judge-query.ts
JudgeQueryOptions.maxTokens is accepted as a field but never passed to the SDK query() call. The prior implementation honored options.maxTokens ?? JUDGE_MAX_TOKENS via max_tokens on the Anthropic SDK call. With the subprocess design, effort: "low" controls token budget indirectly, but the intent to cap at 4096 is now silently dropped.
Action: Either wire maxTokens through to the SDK or remove the field from JudgeQueryOptions and document that effort: "low" replaces explicit token limits.
2. judge_model config is effectively dead (Medium)
File: src/agent/judge-query.ts
The resolution chain is options.model ?? config.judge_model ?? config.model, but every evolution/loop judge caller passes options.model (typically JUDGE_MODEL_SONNET), so judge_model config never wins.
Action: Either update judge callers to omit model when they want the config default, or remove judge_model from the schema to avoid misleading operators.
3. Provider model mapping only works for aliases (Low)
File: src/config/providers.ts
buildProviderEnv() sets ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL, but Phantom passes concrete Claude IDs like claude-sonnet-4-6 in judge calls, not aliases like sonnet. The CLI only remaps aliases, so multi-backend routing is only partially correct for judges.
Action: This is an upstream design limitation. Either switch to using aliases in judge calls, or document that provider model mapping only affects alias-based model references.
4. Dead exports in types.ts (Low)
File: src/evolution/judges/types.ts
JUDGE_MAX_TOKENS and JUDGE_TEMPERATURE are exported but no longer imported anywhere after the subprocess migration.
Action: Remove dead exports.
5. process.env spread needs justification comment (Low)
Files: src/agent/judge-query.ts, src/agent/runtime.ts
Both files use env: { ...process.env, ...providerEnv } which passes the entire parent environment to subprocesses. CLAUDE.md warns "Don't leak process.env to subprocesses" but doesn't carve out the SDK subprocess case.
Action: Add a comment explaining this is intentional for the SDK subprocess (needs PATH, HOME, credential files).
6. Lost partial cost tracking (Info)
File: src/memory/consolidation.ts
JudgeParseError cost tracking is gone - failed judge calls where tokens were consumed go untracked. This is a known tradeoff documented in the code, but affects precise daily_cost_usd evolution gate enforcement.
Action: Document this limitation. No code change needed unless precise cost tracking becomes critical.
Test Coverage Gap
No test exercises the real SDK subprocess model-resolution path or proves judge_model affects any live judge call. Consider adding integration tests if these features become important.
Related
- Upstream sync merge: chore/sync-upstream branch
- Commits merged: Phase 1 (judge subprocess), Phase 2 (provider config), Phase 2.5 (scheduler hardening)
Context
During the upstream sync merge (ghostwright/phantom -> electronicBlacksmith/phantom), code reviews from Claude Sonnet 4.6 and GitHub Copilot GPT-5.4 identified several issues that don't block the merge but should be addressed in follow-up work.
Issues Identified
1.
maxTokenssilently ignored in judge subprocess (Medium)File:
src/agent/judge-query.tsJudgeQueryOptions.maxTokensis accepted as a field but never passed to the SDKquery()call. The prior implementation honoredoptions.maxTokens ?? JUDGE_MAX_TOKENSviamax_tokenson the Anthropic SDK call. With the subprocess design,effort: "low"controls token budget indirectly, but the intent to cap at 4096 is now silently dropped.Action: Either wire
maxTokensthrough to the SDK or remove the field fromJudgeQueryOptionsand document thateffort: "low"replaces explicit token limits.2.
judge_modelconfig is effectively dead (Medium)File:
src/agent/judge-query.tsThe resolution chain is
options.model ?? config.judge_model ?? config.model, but every evolution/loop judge caller passesoptions.model(typicallyJUDGE_MODEL_SONNET), sojudge_modelconfig never wins.Action: Either update judge callers to omit
modelwhen they want the config default, or removejudge_modelfrom the schema to avoid misleading operators.3. Provider model mapping only works for aliases (Low)
File:
src/config/providers.tsbuildProviderEnv()setsANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL, but Phantom passes concrete Claude IDs likeclaude-sonnet-4-6in judge calls, not aliases likesonnet. The CLI only remaps aliases, so multi-backend routing is only partially correct for judges.Action: This is an upstream design limitation. Either switch to using aliases in judge calls, or document that provider model mapping only affects alias-based model references.
4. Dead exports in types.ts (Low)
File:
src/evolution/judges/types.tsJUDGE_MAX_TOKENSandJUDGE_TEMPERATUREare exported but no longer imported anywhere after the subprocess migration.Action: Remove dead exports.
5.
process.envspread needs justification comment (Low)Files:
src/agent/judge-query.ts,src/agent/runtime.tsBoth files use
env: { ...process.env, ...providerEnv }which passes the entire parent environment to subprocesses. CLAUDE.md warns "Don't leak process.env to subprocesses" but doesn't carve out the SDK subprocess case.Action: Add a comment explaining this is intentional for the SDK subprocess (needs PATH, HOME, credential files).
6. Lost partial cost tracking (Info)
File:
src/memory/consolidation.tsJudgeParseErrorcost tracking is gone - failed judge calls where tokens were consumed go untracked. This is a known tradeoff documented in the code, but affects precisedaily_cost_usdevolution gate enforcement.Action: Document this limitation. No code change needed unless precise cost tracking becomes critical.
Test Coverage Gap
No test exercises the real SDK subprocess model-resolution path or proves
judge_modelaffects any live judge call. Consider adding integration tests if these features become important.Related