Datascout three-tier flow: regression in non-Claude runtimes (Codex/Gemini/OpenCode/Copilot/Paperclip)

## Background

PR #446 (v4.16.0–v4.17.1) split `arckit-datascout` into a three-tier subagent architecture (orchestrator slash command → reader subagent + writer subagent) with JSON-Schema-validated handoff between tiers. The pattern is faithful to `anthropics/financial-services` and works correctly in the Claude Code plugin (manually tested in test repo v48 — passed).

## The regression

Pre-PR, ArcKit's converter inlined a single-agent `arckit-datascout.md` into all six distribution formats. Each non-Claude target had a working datascout that did discovery + scoring + writing inline.

Post-PR, the source agent file was deleted (commit `84f11fac`) and orchestration logic moved to the slash command body (`arckit-claude/commands/datascout.md`). The slash command body now contains heavy "dispatch the reader subagent via the `Agent` tool" instructions that the converter inlines into all non-Claude targets:

| Runtime | Files generated | What the LLM sees | Works? |
|---|---|---|---|
| Claude Code (plugin) | `commands/datascout.md` (orchestrator) + `agents/arckit-datascout-{reader,writer}.md` | Slash-command in main thread dispatches `Agent` to spawn reader/writer subagents | ✅ |
| Codex CLI | `arckit-codex/agents/arckit-datascout.md` (303 lines, slash-command body inlined) + `skills/arckit-datascout/SKILL.md` + `prompts/arckit.datascout.md` | Same orchestrator prompt with subagent-dispatch instructions | ❌ |
| Gemini CLI | `arckit-gemini/agents/arckit-datascout.md` + `commands/arckit/datascout.toml` | Same | ❌ |
| OpenCode CLI | `arckit-opencode/agents/arckit-datascout.md` + `commands/arckit.datascout.md` | Same | ❌ |
| GitHub Copilot | `arckit-copilot/agents/arckit-datascout.agent.md` + `prompts/arckit-datascout.prompt.md` | Same | ❌ |
| Paperclip (TS plugin) | `arckit-paperclip/...` JSON entry inlining slash-command body | Same | ❌ |

The reader and writer source files (`agents/arckit-datascout-{reader,writer}.md`) are correctly *filtered* from non-Claude by the `subagent: true` flag — they don't propagate. The leak is the slash-command body, which non-Claude runtimes inline as their datascout entry but cannot honour.

## Runtime analysis

Researched what each non-Claude runtime can actually support:

### Codex CLI v0.128.0 — three-tier achievable
Per https://developers.openai.com/codex/subagents:
- Subagents are first-class. *\"Codex handles orchestration across agents, including spawning new subagents, routing follow-up instructions, waiting for results, and closing agent threads.\"*
- Nested dispatch is configurable. `agents.max_depth` defaults to 1 — *exactly* the depth our pattern needs (orchestrator → reader/writer).
- Skills (`SKILL.md`) carry the orchestrator body in markdown; subagents live as TOML in `~/.codex/agents/`.
- ArcKit's converter already emits Codex agent TOML — extending it to emit reader/writer TOML is incremental.

### Gemini CLI v0.41.1 — three-tier achievable but more work
Per https://github.com/google-gemini/gemini-cli/blob/main/docs/core/subagents.md:
- Subagents are first-class. *\"Subagents are 'specialists' that the main Gemini agent can hire for a specific job\"* with *\"independent context windows\"*.
- Nested dispatch is forbidden: *\"subagents cannot call other subagents.\"*
- BUT — the slash command in `commands/arckit/datascout.toml` runs in **main agent context**, which CAN dispatch subagents (same shape as Claude Code plugin's main thread). The constraint only blocks subagent-orchestrators.
- Frontmatter shape differs from Claude (`kind`, `mcpServers`, `tools` syntax, `timeout_mins`). Converter would need a translator.

### OpenCode CLI / GitHub Copilot / Paperclip — single-agent only
No published subagent-dispatch primitive in their public agent/prompt schema. Single-agent fallback is the only viable shape.

## Options

| Option | Codex | Gemini | OpenCode | Copilot | Paperclip | Effort |
|---|---|---|---|---|---|---|
| **A. Document and accept** | ❌ broken | ❌ broken | ❌ broken | ❌ broken | ❌ broken | 0 |
| **B. Single-agent fallback in all non-Claude** | ✅ works (legacy) | ✅ works (legacy) | ✅ works | ✅ works | ✅ works | ~1 hr |
| **C. Three-tier in Codex + Gemini, fallback elsewhere** | ✅ full parity | ✅ full parity | ✅ works | ✅ works | ✅ works | ~5 hr |
| **D. Three-tier in Codex only, fallback elsewhere** | ✅ full parity | ✅ works (single-agent) | ✅ works | ✅ works | ✅ works | ~2.5 hr |

Effort breakdown for the three-tier slices:

| | Codex | Gemini |
|---|---|---|
| Research dispatch syntax | 15 min | 60 min |
| Converter changes | 30 min | 90 min |
| Two new agent files in target format | 20 min | 45 min |
| Test in target runtime | 15 min | 30 min |
| **Total** | **~1.5 hr** | **~3.5 hr** |

## Recommendation

**Option D — three-tier in Codex, single-agent fallback elsewhere.**

Rationale:
1. Codex is the only non-Claude runtime where the three-tier pattern fits the platform's native model. Implementing it preserves the security isolation that's the whole point of #442 item 1 in a second runtime.
2. Gemini is structurally feasible but the converter-translator work is roughly equal to the Codex three-tier slice + shipping a fallback for everything else combined. Defer until there's a known Gemini ArcKit user.
3. OpenCode/Copilot/Paperclip get a working datascout via the fallback prompt — no security pattern, but no regression vs pre-PR either.
4. The fallback prompt and Codex three-tier work can land in two separate slices: fallback first (smaller, fixes the immediate regression for 4 of 5 affected runtimes); Codex three-tier later when there's bandwidth.

## Implementation plan (Option D)

### Phase 1 — single-agent fallback (~1 hr)
- Add `arckit-claude/agents/arckit-datascout-fallback.md` — single-agent prompt that does discovery + scoring + writing in one role, with the `## Guardrails` section preserving prompt-level isolation. Frontmatter `subagent: true` so it doesn't surface in Claude.
- Update `scripts/converter.py`: when generating non-Claude commands, prefer `arckit-datascout-fallback.md` over the slash-command body when an agent file with that suffix exists.
- Document the limitation in CHANGELOG and READER-PATTERN.md.
- Bump 4.18.0.

### Phase 2 — Codex three-tier (~1.5 hr)
- Add converter logic to emit Codex TOML for `subagent: true` source agents (currently they're filtered for all non-Claude).
- Source `agents/arckit-datascout-{reader,writer}.md` → emit `arckit-codex/agents/arckit-datascout-{reader,writer}.toml` with the right Codex frontmatter.
- Codex skill `SKILL.md` for `arckit-datascout` reverts to the orchestrator body (referencing reader/writer by Codex agent name).
- Document Codex parity in CHANGELOG.
- Bump 4.18.1 or 4.19.0.

## Out of scope for this issue

- Token-usage telemetry: tracked at https://github.com/anthropics/claude-code/issues/47045
- Skill-listing budget pressure (the "too many skills" warning surfaced during PR #446 testing): separate work to trim community-overlay command descriptions.

## References

- PR #446 — `feat/442-datascout-reader-split` (merged: TBD as of this issue)
- `arckit-claude/agents/READER-PATTERN.md` — the security pattern this regression breaks for non-Claude
- Issue #442 — original financial-services adoption tracker
- https://developers.openai.com/codex/subagents — Codex subagent docs
- https://github.com/google-gemini/gemini-cli/blob/main/docs/core/subagents.md — Gemini subagent docs

---
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datascout three-tier flow: regression in non-Claude runtimes (Codex/Gemini/OpenCode/Copilot/Paperclip) #447

Background

The regression

Runtime analysis

Codex CLI v0.128.0 — three-tier achievable

Gemini CLI v0.41.1 — three-tier achievable but more work

OpenCode CLI / GitHub Copilot / Paperclip — single-agent only

Options

Recommendation

Implementation plan (Option D)

Phase 1 — single-agent fallback (~1 hr)

Phase 2 — Codex three-tier (~1.5 hr)

Out of scope for this issue

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Runtime	Files generated	What the LLM sees	Works?
Claude Code (plugin)	`commands/datascout.md` (orchestrator) + `agents/arckit-datascout-{reader,writer}.md`	Slash-command in main thread dispatches `Agent` to spawn reader/writer subagents	✅
Codex CLI	`arckit-codex/agents/arckit-datascout.md` (303 lines, slash-command body inlined) + `skills/arckit-datascout/SKILL.md` + `prompts/arckit.datascout.md`	Same orchestrator prompt with subagent-dispatch instructions	❌
Gemini CLI	`arckit-gemini/agents/arckit-datascout.md` + `commands/arckit/datascout.toml`	Same	❌
OpenCode CLI	`arckit-opencode/agents/arckit-datascout.md` + `commands/arckit.datascout.md`	Same	❌
GitHub Copilot	`arckit-copilot/agents/arckit-datascout.agent.md` + `prompts/arckit-datascout.prompt.md`	Same	❌
Paperclip (TS plugin)	`arckit-paperclip/...` JSON entry inlining slash-command body	Same	❌

Option	Codex	Gemini	OpenCode	Copilot	Paperclip	Effort
A. Document and accept	❌ broken	❌ broken	❌ broken	❌ broken	❌ broken	0
B. Single-agent fallback in all non-Claude	✅ works (legacy)	✅ works (legacy)	✅ works	✅ works	✅ works	~1 hr
C. Three-tier in Codex + Gemini, fallback elsewhere	✅ full parity	✅ full parity	✅ works	✅ works	✅ works	~5 hr
D. Three-tier in Codex only, fallback elsewhere	✅ full parity	✅ works (single-agent)	✅ works	✅ works	✅ works	~2.5 hr

	Codex	Gemini
Research dispatch syntax	15 min	60 min
Converter changes	30 min	90 min
Two new agent files in target format	20 min	45 min
Test in target runtime	15 min	30 min
Total	~1.5 hr	~3.5 hr

Datascout three-tier flow: regression in non-Claude runtimes (Codex/Gemini/OpenCode/Copilot/Paperclip) #447

Description

Background

The regression

Runtime analysis

Codex CLI v0.128.0 — three-tier achievable

Gemini CLI v0.41.1 — three-tier achievable but more work

OpenCode CLI / GitHub Copilot / Paperclip — single-agent only

Options

Recommendation

Implementation plan (Option D)

Phase 1 — single-agent fallback (~1 hr)

Phase 2 — Codex three-tier (~1.5 hr)

Out of scope for this issue

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions