feat(convo_miner): auto-route AI tool sessions to wing_api#1236
feat(convo_miner): auto-route AI tool sessions to wing_api#1236milla-jovovich wants to merge 1 commit intodevelopfrom
Conversation
When mempalace mine --mode convos is invoked against a directory inside
a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the
destination wing now auto-defaults to wing_api rather than the directory
basename. Conversations from external API-keyed tools land grouped under
a single dedicated wing for visibility.
Detected paths (exact-segment match — substrings like .gemini-backup or
.codex-archive do NOT match):
- any segment .codex (Codex CLI sessions / archives)
- any segment .gemini (Gemini CLI sessions under ~/.gemini/tmp/...)
- the consecutive segment pair .claude/projects (Claude Code).
.claude alone is NOT matched - that is the settings/config dir,
not a conversation source.
Wing-resolution precedence (first match wins):
1. Explicit --wing argument from the user - always wins
2. AI-tool path detection -> wing_api
3. Basename fallback (existing behavior, unchanged)
Two new helpers split out of mine_convos for unit-test coverage:
- _is_ai_tool_path(path: Path) -> bool
- _resolve_wing(convo_path: Path, wing: Optional[str]) -> str
mine_convos now calls _resolve_wing in place of its inline basename
logic. No other call sites or downstream consumers change.
Test coverage:
- 15 unit tests covering positive matches (Claude Code subdir + root,
Codex root + sessions, Gemini root + chats), negative cases
(.claude alone is settings dir, unrelated paths, substring no-match
on .gemini-backup / .codex-archive), explicit --wing override,
auto-route trio, basename fallback, empty-string-as-no-wing.
- End-to-end smoke test (manual): real-shape Claude Code JSONL fixture
mined via the actual CLI; sqlite read-back of /tmp palace confirms
drawers landed with wing='wing_api' and verbatim content preserved;
mempalace search --wing wing_api returns expected content ranked.
- Full pytest sweep: 1388 baseline + 15 new = 1403 passed, zero
regressions.
Design context:
This change reflects Aya's product call that conversations from
API-keyed AI tools should land in a structural wing_api rather than be
scattered across topical wings derived from directory basenames. Igor's
ADR-0017 in mempalace-ts proposes the alternative of source-prefix
metadata (source LIKE 'api/%') with topical wing assignment instead;
that approach has architectural merit (wings stay topical) but does not
deliver the single-wing visibility users get here. Open for review
discussion - explicit --wing flag and basename fallback both unchanged,
so this is additive and reversible.
Closes part of #59 for the auto-routing UX.
c2f5d71 to
4098c54
Compare
bensig
left a comment
There was a problem hiding this comment.
Approve. Clean separation of concerns, correct path-matching, full test coverage, and the wing-resolution precedence is exactly right.
Full pytest on this branch: 1456 passed, 1 skipped, 19 in `test_convo_miner.py` (matches the +15 new the body promised).
What I checked
Path matching is right and defensive
- `path.resolve().parts` handles symlinks and relative paths correctly. A user who symlinks a Claude transcript dir somewhere else still gets routed to `wing_api` because resolve surfaces the original `.claude/projects/` path.
- `try/except (OSError, RuntimeError)` around `resolve()` catches the rare case of a broken symlink or path-too-long without crashing the mine.
- Exact-segment match is the key correctness detail. `.gemini-backup` and `.codex-archive` correctly do NOT match — that would have been an easy mistake. The negative tests cover both.
- `.claude/projects` requires the consecutive-segment pair, not bare `.claude` (which is the settings dir, not conversations). Also tested.
Wing-resolution precedence is correct
- Explicit `--wing` always wins. User intent sacrosanct.
- AI-tool path → `wing_api` when no explicit wing.
- `normalize_wing_name(basename)` fallback uses the shared helper from `config.py` — same source of truth as `cmd_init`, `room_detector_local`, and `miner.load_config`. Slots cleanly into the #1194 consolidation work that landed yesterday.
The empty-string handling (`if wing:` is falsy on `""`) matches the #1097 "empty-string as no filter" pattern that's now consistent across the codebase.
Tests cover the right surface
The 15 new test cases hit:
- Positive matches: Claude Code subdir + root, Codex root + sessions, Gemini root + chats
- Negative cases: `.claude` alone (settings, NOT conversations), unrelated paths, substring no-match on `.gemini-backup` / `.codex-archive`
- Override paths: explicit `--wing` beats auto-route, basename fallback on non-AI paths, empty-string treated as no-wing
The negative cases are the ones that prove the matching is exact rather than fuzzy. Good discipline.
Architecturally sound
Routing API-driven conversations to a dedicated `wing_api` (separate from project wings) is the right default. They're a different kind of content — general LLM exchanges that can span any topic — so segregating them into one wing makes search and graph traversal cleaner. Users who want them in a specific project wing pass `--wing`; users who do nothing get something semantically reasonable.
Minor observations (not blockers)
-
`wing_api` is hardcoded in `_resolve_wing`. Could be promoted to a module-level constant (`_AI_TOOL_DEFAULT_WING = "wing_api"`) for visibility, but not material.
-
The detection list is closed. As more AI-tool ecosystems land (Cursor, Continue, Aider, Zed AI, etc.), this set needs extension. Could become config-driven later (env var or `config.json` key like `ai_tool_path_segments`). Out of scope for this PR; worth a follow-up issue if the list grows.
-
Edge case worth knowing: if a user mines `~/.claude/projects/-Users-me-Projects-MyProject/` and wants those Claude conversations specifically in `wing_myproject`, they need `--wing myproject`. The default (`wing_api`) is more useful for the majority case where Claude conversations are general-purpose, but worth a doc note that "explicit per-project routing of AI-tool conversations is one flag away."
Closes part of #59. Ship it.
When mempalace mine --mode convos is invoked against a directory inside a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the destination wing now auto-defaults to wing_api rather than the directory basename. Conversations from external API-keyed tools land grouped under a single dedicated wing for visibility.
Detected paths (exact-segment match — substrings like .gemini-backup or .codex-archive do NOT match):
Wing-resolution precedence (first match wins):
Two new helpers split out of mine_convos for unit-test coverage:
mine_convos now calls _resolve_wing in place of its inline basename logic. No other call sites or downstream consumers change.
Test coverage:
Closes part of #59 for the auto-routing UX.