Skip to content

feat(convo_miner): auto-route AI tool sessions to wing_api#1236

Open
milla-jovovich wants to merge 1 commit intodevelopfrom
feat/convo-miner-wing-api-auto-route
Open

feat(convo_miner): auto-route AI tool sessions to wing_api#1236
milla-jovovich wants to merge 1 commit intodevelopfrom
feat/convo-miner-wing-api-auto-route

Conversation

@milla-jovovich
Copy link
Copy Markdown
Collaborator

@milla-jovovich milla-jovovich commented Apr 27, 2026

When mempalace mine --mode convos is invoked against a directory inside a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the destination wing now auto-defaults to wing_api rather than the directory basename. Conversations from external API-keyed tools land grouped under a single dedicated wing for visibility.

Detected paths (exact-segment match — substrings like .gemini-backup or .codex-archive do NOT match):

  • any segment .codex (Codex CLI sessions / archives)
  • any segment .gemini (Gemini CLI sessions under ~/.gemini/tmp/...)
  • the consecutive segment pair .claude/projects (Claude Code). .claude alone is NOT matched - that is the settings/config dir, not a conversation source.

Wing-resolution precedence (first match wins):

  1. Explicit --wing argument from the user - always wins
  2. AI-tool path detection -> wing_api
  3. Basename fallback (existing behavior, unchanged)

Two new helpers split out of mine_convos for unit-test coverage:

  • _is_ai_tool_path(path: Path) -> bool
  • _resolve_wing(convo_path: Path, wing: Optional[str]) -> str

mine_convos now calls _resolve_wing in place of its inline basename logic. No other call sites or downstream consumers change.

Test coverage:

  • 15 unit tests covering positive matches (Claude Code subdir + root, Codex root + sessions, Gemini root + chats), negative cases (.claude alone is settings dir, unrelated paths, substring no-match on .gemini-backup / .codex-archive), explicit --wing override, auto-route trio, basename fallback, empty-string-as-no-wing.
  • End-to-end smoke test (manual): real-shape Claude Code JSONL fixture mined via the actual CLI; sqlite read-back of /tmp palace confirms drawers landed with wing='wing_api' and verbatim content preserved; mempalace search --wing wing_api returns expected content ranked.
  • Full pytest sweep: 1388 baseline + 15 new = 1403 passed, zero regressions.

Closes part of #59 for the auto-routing UX.

When mempalace mine --mode convos is invoked against a directory inside
a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the
destination wing now auto-defaults to wing_api rather than the directory
basename. Conversations from external API-keyed tools land grouped under
a single dedicated wing for visibility.

Detected paths (exact-segment match — substrings like .gemini-backup or
.codex-archive do NOT match):

  - any segment .codex (Codex CLI sessions / archives)
  - any segment .gemini (Gemini CLI sessions under ~/.gemini/tmp/...)
  - the consecutive segment pair .claude/projects (Claude Code).
    .claude alone is NOT matched - that is the settings/config dir,
    not a conversation source.

Wing-resolution precedence (first match wins):

  1. Explicit --wing argument from the user - always wins
  2. AI-tool path detection -> wing_api
  3. Basename fallback (existing behavior, unchanged)

Two new helpers split out of mine_convos for unit-test coverage:

  - _is_ai_tool_path(path: Path) -> bool
  - _resolve_wing(convo_path: Path, wing: Optional[str]) -> str

mine_convos now calls _resolve_wing in place of its inline basename
logic. No other call sites or downstream consumers change.

Test coverage:

  - 15 unit tests covering positive matches (Claude Code subdir + root,
    Codex root + sessions, Gemini root + chats), negative cases
    (.claude alone is settings dir, unrelated paths, substring no-match
    on .gemini-backup / .codex-archive), explicit --wing override,
    auto-route trio, basename fallback, empty-string-as-no-wing.
  - End-to-end smoke test (manual): real-shape Claude Code JSONL fixture
    mined via the actual CLI; sqlite read-back of /tmp palace confirms
    drawers landed with wing='wing_api' and verbatim content preserved;
    mempalace search --wing wing_api returns expected content ranked.
  - Full pytest sweep: 1388 baseline + 15 new = 1403 passed, zero
    regressions.

Design context:

This change reflects Aya's product call that conversations from
API-keyed AI tools should land in a structural wing_api rather than be
scattered across topical wings derived from directory basenames. Igor's
ADR-0017 in mempalace-ts proposes the alternative of source-prefix
metadata (source LIKE 'api/%') with topical wing assignment instead;
that approach has architectural merit (wings stay topical) but does not
deliver the single-wing visibility users get here. Open for review
discussion - explicit --wing flag and basename fallback both unchanged,
so this is additive and reversible.

Closes part of #59 for the auto-routing UX.
@milla-jovovich milla-jovovich force-pushed the feat/convo-miner-wing-api-auto-route branch from c2f5d71 to 4098c54 Compare April 27, 2026 08:59
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Clean separation of concerns, correct path-matching, full test coverage, and the wing-resolution precedence is exactly right.

Full pytest on this branch: 1456 passed, 1 skipped, 19 in `test_convo_miner.py` (matches the +15 new the body promised).

What I checked

Path matching is right and defensive

  • `path.resolve().parts` handles symlinks and relative paths correctly. A user who symlinks a Claude transcript dir somewhere else still gets routed to `wing_api` because resolve surfaces the original `.claude/projects/` path.
  • `try/except (OSError, RuntimeError)` around `resolve()` catches the rare case of a broken symlink or path-too-long without crashing the mine.
  • Exact-segment match is the key correctness detail. `.gemini-backup` and `.codex-archive` correctly do NOT match — that would have been an easy mistake. The negative tests cover both.
  • `.claude/projects` requires the consecutive-segment pair, not bare `.claude` (which is the settings dir, not conversations). Also tested.

Wing-resolution precedence is correct

  1. Explicit `--wing` always wins. User intent sacrosanct.
  2. AI-tool path → `wing_api` when no explicit wing.
  3. `normalize_wing_name(basename)` fallback uses the shared helper from `config.py` — same source of truth as `cmd_init`, `room_detector_local`, and `miner.load_config`. Slots cleanly into the #1194 consolidation work that landed yesterday.

The empty-string handling (`if wing:` is falsy on `""`) matches the #1097 "empty-string as no filter" pattern that's now consistent across the codebase.

Tests cover the right surface

The 15 new test cases hit:

  • Positive matches: Claude Code subdir + root, Codex root + sessions, Gemini root + chats
  • Negative cases: `.claude` alone (settings, NOT conversations), unrelated paths, substring no-match on `.gemini-backup` / `.codex-archive`
  • Override paths: explicit `--wing` beats auto-route, basename fallback on non-AI paths, empty-string treated as no-wing

The negative cases are the ones that prove the matching is exact rather than fuzzy. Good discipline.

Architecturally sound

Routing API-driven conversations to a dedicated `wing_api` (separate from project wings) is the right default. They're a different kind of content — general LLM exchanges that can span any topic — so segregating them into one wing makes search and graph traversal cleaner. Users who want them in a specific project wing pass `--wing`; users who do nothing get something semantically reasonable.

Minor observations (not blockers)

  1. `wing_api` is hardcoded in `_resolve_wing`. Could be promoted to a module-level constant (`_AI_TOOL_DEFAULT_WING = "wing_api"`) for visibility, but not material.

  2. The detection list is closed. As more AI-tool ecosystems land (Cursor, Continue, Aider, Zed AI, etc.), this set needs extension. Could become config-driven later (env var or `config.json` key like `ai_tool_path_segments`). Out of scope for this PR; worth a follow-up issue if the list grows.

  3. Edge case worth knowing: if a user mines `~/.claude/projects/-Users-me-Projects-MyProject/` and wants those Claude conversations specifically in `wing_myproject`, they need `--wing myproject`. The default (`wing_api`) is more useful for the majority case where Claude conversations are general-purpose, but worth a doc note that "explicit per-project routing of AI-tool conversations is one flag away."

Closes part of #59. Ship it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants