feat(normalize): Gemini CLI session JSONL adapter#1234
Conversation
Adds a fifth format adapter to mempalace.normalize alongside the existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers. After this lands, mempalace mine --mode convos ingests Gemini CLI session history without manual export. Why now: Claude Code and Codex CLI are already supported by convo_miner; adding Gemini closes the major-CLI-tool coverage gap. After this lands, the README's "verbatim conversation history" promise is honestly delivered for all three top-tier API-keyed coding CLIs (Claude Code, Codex CLI, Gemini CLI), not just two of them. This is the third leg of the trio Aya pushed for so the public claim matches the actual ingest pipeline. Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as JSONL. The on-disk schema (per google-gemini/gemini-cli#15292): {"type":"session_metadata","sessionId":"...","projectHash":"...",...} {"type":"user","id":"msg1","content":[{"text":"Hello"}]} {"type":"gemini","id":"msg2","content":[{"text":"Hi"}]} {"type":"message_update","id":"msg2","tokens":{"input":10,"output":5}} The new _try_gemini_jsonl parser: - requires a session_metadata record so it does not false-positive against Claude Code or Codex JSONL passing through the dispatch chain in _try_normalize_json - extracts user/gemini message text from each entry's content array of {"text": "..."} blocks, joining multiple blocks per message in order - skips message_update entries (token-count deltas with no message text) and any other unknown record types - returns None when fewer than two conversational messages are present, mirroring the codex parser's >=2-message guard Test coverage: 9 new unit tests in tests/test_normalize.py mirroring the codex test pattern - happy path, multi-turn, missing session metadata, message_update skip, single-message rejection, multi-block content concatenation, empty content skip, malformed-line resilience, and explicit no-match against codex JSONL fixtures. Schema-level only; real Gemini CLI session fixtures are a follow-up once a real user file is available. Closes part of #59 (the Gemini CLI portion of the umbrella request).
|
@copilot review this |
…ts list Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/a32f48bb-2a78-494a-9698-e69304732d3f Co-authored-by: igorls <[email protected]>
Reviewed. Here's a summary of what I found: Implementation (normalize.py)
Tests (test_normalize.py)
Fix applied (a3e3691): The module-level docstring in |
There was a problem hiding this comment.
Pull request overview
Adds a Gemini CLI JSONL format adapter to mempalace.normalize so mempalace mine --mode convos can ingest Gemini CLI session logs stored as JSONL without manual export, aligning normalization support with other major coding CLIs already supported.
Changes:
- Add
_try_gemini_jsonl()parser to detect/parse Gemini CLI JSONL sessions into transcript format. - Wire the new parser into
_try_normalize_json()dispatch chain. - Add unit tests covering Gemini JSONL detection, parsing behaviors, and false-positive guards.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
mempalace/normalize.py |
Adds Gemini CLI JSONL normalization and dispatch-chain integration. |
tests/test_normalize.py |
Adds unit tests validating Gemini JSONL parsing and non-matching behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Detection requires a ``session_metadata`` record so this parser does | ||
| not false-positive against Claude Code or Codex JSONL passed through | ||
| the dispatch chain. ``message_update`` entries are skipped — they | ||
| have no message text. Multiple text blocks within a single message's | ||
| content array are concatenated in order, separated by newlines. |
There was a problem hiding this comment.
The docstring states session_metadata is on the first line, but the implementation accepts session_metadata anywhere and will also collect user/gemini messages that appear before it. To match the stated schema and strengthen the false-positive guard, require the first non-empty parsed entry to be type=="session_metadata" (or ignore messages until session_metadata has been seen).
|
@copilot apply changes based on the comments in this thread |
…tinel Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/4511e9aa-38e7-440e-a6f8-eda91e576f0f Co-authored-by: igorls <[email protected]>
Done in e7fe6ca. Added |
Adds a fifth format adapter to mempalace.normalize alongside the existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers. After this lands, mempalace mine --mode convos ingests Gemini CLI session history without manual export.
Why now: Claude Code and Codex CLI are already supported by convo_miner; adding Gemini closes the major-CLI-tool coverage gap. After this lands, the README's "verbatim conversation history" promise is honestly delivered for all three top-tier API-keyed coding CLIs (Claude Code, Codex CLI, Gemini CLI), not just two of them. This is the third leg of the trio Aya pushed for so the public claim matches the actual ingest pipeline.
Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as JSONL. The on-disk schema (per google-gemini/gemini-cli#15292):
The new _try_gemini_jsonl parser:
Test coverage: 9 new unit tests in tests/test_normalize.py mirroring the codex test pattern - happy path, multi-turn, missing session metadata, message_update skip, single-message rejection, multi-block content concatenation, empty content skip, malformed-line resilience, and explicit no-match against codex JSONL fixtures. Schema-level only; real Gemini CLI session fixtures are a follow-up once a real user file is available.
Closes part of #59 (the Gemini CLI portion of the umbrella request).
What does this PR do?
How to test
Checklist
python -m pytest tests/ -v)ruff check .)