Skip to content

Consider a Claude Code skill as alternative to MCP server #230

@shntnu

Description

@shntnu

For Claude Code users, a skill wrapping the CLI works better than the MCP server. I started with the MCP server but found Claude struggled with it — it issued wrong queries, get_message responses filled context (~11k tokens per call), and stage_deletion could be called without user confirmation. I used the superpowers:brainstorming skill to design the approach, then the skill-creator skill to iteratively draft, test, remove PII from examples, and optimize the trigger description over three sessions. The result covers the full CLI surface and works reliably.

The MCP server is still the right choice for clients without shell access (Claude Desktop). But for Claude Code, a skill shipped in the repo (e.g. skills/msgvault/SKILL.md) would give users a better experience. The one I've been using is copied below.

The rest below is bot-generated text, edited by me.

To test the skill, I used the skill-creator's optimization loop: 19 trigger eval queries (9 should-trigger, 10 should-not-trigger near-misses like "set up Gmail OAuth for my web app" or "write an email validation regex"), split 60/40 into train/test, each evaluated 3 runs per iteration. The first discovery was that the MCP server and skill compete — with both installed, recall was 0% because Claude used MCP tools directly and never loaded the skill. After removing the MCP server, I ran three functional evals comparing with-skill vs without-skill. The clearest result was when I asked to find a specific recommendation thread, the without-skill agent found the wrong email (wrong year, wrong person), while the with-skill agent found the correct one through iterative search refinement; this is exactly the workflow the skill teaches.

A caveat: some of the problems I hit were really about the MCP server lacking server.WithInstructions(...) — search strategy, multi-tool workflows, the kind of guidance that bioRxiv and ChEMBL servers already provide. Better MCP instructions would fix those. But other issues are structural: get_message returns ~11k tokens per call with no way to request specific fields, and there's no MCP equivalent of composing --json | jq '.body_text' to control output size. A skill works around that by teaching Claude to pipe CLI output through Unix tools.

I suspect users of the GitHub MCP server have the same experience (vs. using the gh cli).

SKILL.md
---
name: msgvault
description: "Search, read, and analyze the user's offline email archive via the msgvault CLI. Use when the user asks about their email, messages, inbox, Gmail, senders, attachments, or wants to find/search/delete/export emails. Also trigger when the user mentions msgvault, email archive, or asks questions like 'who emailed me about X', 'find the email from Y', 'how many emails did I get from Z', or 'export that attachment'. Covers searching, reading, analytics, attachment export, deletion management, and import from mbox/Apple Mail."
---

# msgvault — Email Archive CLI

msgvault is a local offline Gmail archive. All operations run against a local database — no Gmail API calls, no network access during queries.

Binary: `~/.local/bin/msgvault` (may not be on PATH — use full path if needed).
Data: `~/.msgvault/` (config, database, tokens, attachments, analytics cache).

When `[remote].url` is configured in `~/.msgvault/config.toml`, query commands (`search`, `show-message`, analytics) connect to a remote msgvault server instead of the local database. Use `--local` to force local queries.

## Commands

### Discovery & Analytics

```bash
msgvault stats                                        # archive overview (accounts, message count, size)
msgvault list-accounts                                # synced email accounts
msgvault list-senders --limit 20                      # top senders by message count
msgvault list-senders --limit 20 --after 2025-01-01   # top senders this year
msgvault list-senders --after 2025-01-01 --json       # JSON output for programmatic use
msgvault list-domains --limit 20                      # top sender domains
msgvault list-domains --after 2025-01-01 --before 2025-06-01  # date-scoped
msgvault list-labels --limit 20                       # all labels with counts
msgvault cache-stats                                  # analytics cache row counts and file sizes
```

The `--after` and `--before` flags on `list-senders` and `list-domains` are useful for time-scoped analytics ("who emailed me the most this quarter"). All analytics commands support `--limit` (`-n`) and `--json`.

### Search

```bash
msgvault search 'some topic' --limit 10               # free text (subject + body via FTS5)
msgvault search 'from:user@example.com' -n 10         # exact sender email
msgvault search 'subject:proposal after:2025-01-01 has:attachment' -n 10
msgvault search '"exact phrase"' -n 10                 # quoted phrase match
msgvault search 'newer_than:30d subject:invoice' -n 20  # relative date
msgvault search 'larger:5M has:attachment' -n 10       # large messages
msgvault search 'from:user@example.com' -n 50 --json  # JSON for piping to jq
msgvault search 'label:INBOX' -n 50 --offset 50       # pagination (skip first 50)
```

Default limit is 50. Use `--limit` (`-n`) and `--offset` for pagination.

### Read a Message

```bash
msgvault show-message <id>                            # full message (id from search --json)
msgvault show-message <id> --json                     # structured output
msgvault show-message <id> --json | jq '.body_text'   # just the body
```

The `<id>` can be an internal numeric ID or a Gmail message ID (hex string like `18f0abc123def`).

### Export

```bash
# Export all attachments from a message (simplest approach)
msgvault export-attachments <message-id>              # all attachments to cwd with original filenames
msgvault export-attachments <message-id> -o ~/Downloads  # to specific directory

# Export a single attachment by content hash (from show-message --json .attachments[].content_hash)
msgvault export-attachment <hash> -o invoice.pdf

# Export message as .eml file (for forwarding or opening in mail clients)
msgvault export-eml <id>
msgvault export-eml <id> -o message.eml
```

### Deletion (two-step: staging via TUI, then execution via CLI)

Deletion is permanent. msgvault uses a two-step process: first stage messages in the TUI, then execute via CLI. This prevents accidental data loss.

**Step 1 — Stage in the TUI:** Launch `msgvault tui`, navigate to the messages you want to delete, select them with `Space`, and press `d` to stage them. Use `D` to stage all messages matching the current filter. There is no CLI command for staging — it's intentionally done through the interactive TUI so you can visually confirm what you're selecting.

**Step 2 — Review and execute via CLI:**
```bash
# Review what's staged
msgvault list-deletions                               # show pending batches
msgvault show-deletion <batch-id>                     # inspect a specific batch
msgvault delete-staged --list                         # list staged batches without executing

# Safety checks
msgvault delete-staged --dry-run                      # show what would be deleted

# Execute or cancel
msgvault delete-staged                                # permanently delete staged messages (fast)
msgvault delete-staged <batch-id>                     # execute a specific batch
msgvault delete-staged --trash                        # move to Gmail trash instead (recoverable 30 days, slower)
msgvault delete-staged --account user@gmail.com       # specify which account
msgvault delete-staged --yes                          # skip confirmation prompt
msgvault cancel-deletion <batch-id>                   # cancel a pending batch
```

Always confirm with the user before executing deletions, and suggest `--dry-run` first.

### Sync & Maintenance

```bash
msgvault sync <email>                                 # incremental Gmail sync
msgvault sync-full <email>                            # full sync (resumable)
msgvault sync-full <email> --after 2024-01-01         # sync a date range
msgvault sync-full <email> --limit 100                # limit message count
msgvault verify <email>                               # verify archive integrity
msgvault build-cache                                  # build/update Parquet analytics cache
msgvault build-cache --full-rebuild                   # full rebuild from scratch
msgvault repair-encoding                              # fix invalid UTF-8 in message text
msgvault update                                       # self-update binary
msgvault tui                                          # interactive terminal browser
```

### Account Management

```bash
msgvault add-account user@gmail.com                   # browser-based OAuth
msgvault add-account user@gmail.com --headless        # device code flow (SSH, no browser)
msgvault update-account user@gmail.com --display-name "Work"  # set display name
msgvault remove-account user@gmail.com                # remove account and all its data (irreversible)
msgvault remove-account user@gmail.com --yes          # skip confirmation
```

### Import (non-Gmail sources)

```bash
msgvault import-mbox user@example.com /path/to/export.mbox    # standard mbox
msgvault import-mbox user@hey.com hey-export.zip --source-type hey --label hey  # HEY.com
msgvault import-emlx user@gmail.com ~/Mail/                    # Apple Mail .emlx files
```

### Utilities

```bash
msgvault create-subset -o /tmp/demo --rows 500        # create smaller DB for testing/demos
msgvault export-token user@gmail.com --to https://nas:8080 --api-key KEY  # export token to remote server
msgvault setup                                        # interactive first-run setup wizard
```

## Search Query Syntax

Operators can be combined freely. The `from:` operator requires the **exact** email address — no fuzzy matching.

| Operator | Example | Notes |
|----------|---------|-------|
| `from:` | `from:alice@example.com` | Exact sender email |
| `to:` | `to:bob@example.com` | Exact recipient |
| `cc:` | `cc:team@example.com` | CC recipient |
| `bcc:` | `bcc:admin@example.com` | BCC recipient |
| `subject:` | `subject:meeting` | Subject contains word |
| `label:` | `label:IMPORTANT` | Gmail label |
| `has:attachment` | `has:attachment` | Messages with files |
| `after:` | `after:2025-01-01` | Absolute date (YYYY-MM-DD) |
| `before:` | `before:2025-06-30` | Absolute date |
| `newer_than:` | `newer_than:7d` | Relative: d=days, w=weeks, m=months, y=years |
| `older_than:` | `older_than:1y` | Relative date |
| `larger:` | `larger:5M` | Size filter (K or M) |
| `smaller:` | `smaller:100K` | Size filter |
| `"quoted"` | `"exact phrase"` | Phrase match |
| bare words | `project report` | Full-text across subject and body |

## Search Strategy

Finding specific emails often takes 2-3 attempts. The key constraint: `from:` needs an exact email address, which you usually don't have. Here's how to handle that:

**When you know the person but not their email:**
```bash
# Step 1: Find their exact address
msgvault list-senders --limit 100 | grep -i lastname
# Step 2: Search with the exact address
msgvault search 'from:jdoe@example.com subject:proposal' -n 10
```

**When you know the topic but not the sender:**
```bash
# Free text search with time constraints
msgvault search 'grant proposal newer_than:60d' -n 20
# Or by subject with distinctive terms
msgvault search 'subject:"Q4 budget review"' -n 10
```

**When results are too broad:**
- Add `newer_than:` or date ranges to narrow the window
- Combine `from:` with subject keywords
- Use `--json | jq` to filter programmatically

**When results are too narrow or empty:**
- Drop operators and try different keyword combinations
- Full names work better than first names alone
- Try broader date ranges or remove date filters entirely

## When to Stop Searching

Not every email exists in the archive. Avoid burning time on exhaustive searches when the email likely isn't there. Follow this escalation and then stop:

1. Run 2-3 targeted searches combining `from:` (if known) with keywords and a date range
2. If those fail, try broader free-text searches with different keyword combinations
3. If still nothing after 5-6 total searches, tell the user what you tried and suggest:
   - The archive may need syncing (`msgvault sync`)
   - The email might be in a different account (check `msgvault list-accounts`)
   - The user may be misremembering the timeframe or sender

Don't exhaustively paginate through hundreds of messages or try dozens of keyword permutations. If targeted searches with the right sender and plausible keywords don't find it, more searching rarely helps.

## Working with JSON Output

`--json` on `search` returns structured data. Compose with `jq`/`sort`/`uniq` for analytics:

```bash
# Count GitHub notifications by repo
msgvault search 'from:notifications@github.com newer_than:30d' -n 500 --json \
  | jq -r '.[].subject' \
  | sed 's/.*\[\(.*\)\].*/\1/' \
  | sort | uniq -c | sort -rn | head -10

# List unique senders matching a domain
msgvault search 'from:@example.com newer_than:90d' -n 200 --json \
  | jq -r '.[].from' | sort -u

# Extract message IDs for batch operations
msgvault search 'from:noreply@spam.com older_than:6m' -n 500 --json \
  | jq -r '.[].id'
```

## Performance Notes

- First query per engine per process has a ~2s cold start; subsequent queries run in 20-600ms
- Two engines: SQLite FTS5 (full-text search), DuckDB over Parquet (analytics via list-domains, list-senders)
- Control output size with `--limit` (`-n`) and `--offset` to avoid context bloat
- `show-message` output can be very large for long threads — use `--json | jq` to extract specific fields
- Use `--local` flag to force local database if a remote server is configured

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions