feat(normalize): Gemini CLI session JSONL adapter by milla-jovovich · Pull Request #1234 · MemPalace/mempalace

milla-jovovich · 2026-04-27T08:45:04Z

Adds a fifth format adapter to mempalace.normalize alongside the existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers. After this lands, mempalace mine --mode convos ingests Gemini CLI session history without manual export.

Why now: Claude Code and Codex CLI are already supported by convo_miner; adding Gemini closes the major-CLI-tool coverage gap. After this lands, the README's "verbatim conversation history" promise is honestly delivered for all three top-tier API-keyed coding CLIs (Claude Code, Codex CLI, Gemini CLI), not just two of them. This is the third leg of the trio Aya pushed for so the public claim matches the actual ingest pipeline.

Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as JSONL. The on-disk schema (per google-gemini/gemini-cli#15292):

{"type":"session_metadata","sessionId":"...","projectHash":"...",...}
{"type":"user","id":"msg1","content":[{"text":"Hello"}]}
{"type":"gemini","id":"msg2","content":[{"text":"Hi"}]}
{"type":"message_update","id":"msg2","tokens":{"input":10,"output":5}}

The new _try_gemini_jsonl parser:

requires a session_metadata record so it does not false-positive against Claude Code or Codex JSONL passing through the dispatch chain in _try_normalize_json
extracts user/gemini message text from each entry's content array of {"text": "..."} blocks, joining multiple blocks per message in order
skips message_update entries (token-count deltas with no message text) and any other unknown record types
returns None when fewer than two conversational messages are present, mirroring the codex parser's >=2-message guard

Test coverage: 9 new unit tests in tests/test_normalize.py mirroring the codex test pattern - happy path, multi-turn, missing session metadata, message_update skip, single-message rejection, multi-block content concatenation, empty content skip, malformed-line resilience, and explicit no-match against codex JSONL fixtures. Schema-level only; real Gemini CLI session fixtures are a follow-up once a real user file is available.

Closes part of #59 (the Gemini CLI portion of the umbrella request).

What does this PR do?

How to test

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

Adds a fifth format adapter to mempalace.normalize alongside the existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers. After this lands, mempalace mine --mode convos ingests Gemini CLI session history without manual export. Why now: Claude Code and Codex CLI are already supported by convo_miner; adding Gemini closes the major-CLI-tool coverage gap. After this lands, the README's "verbatim conversation history" promise is honestly delivered for all three top-tier API-keyed coding CLIs (Claude Code, Codex CLI, Gemini CLI), not just two of them. This is the third leg of the trio Aya pushed for so the public claim matches the actual ingest pipeline. Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as JSONL. The on-disk schema (per google-gemini/gemini-cli#15292): {"type":"session_metadata","sessionId":"...","projectHash":"...",...} {"type":"user","id":"msg1","content":[{"text":"Hello"}]} {"type":"gemini","id":"msg2","content":[{"text":"Hi"}]} {"type":"message_update","id":"msg2","tokens":{"input":10,"output":5}} The new _try_gemini_jsonl parser: - requires a session_metadata record so it does not false-positive against Claude Code or Codex JSONL passing through the dispatch chain in _try_normalize_json - extracts user/gemini message text from each entry's content array of {"text": "..."} blocks, joining multiple blocks per message in order - skips message_update entries (token-count deltas with no message text) and any other unknown record types - returns None when fewer than two conversational messages are present, mirroring the codex parser's >=2-message guard Test coverage: 9 new unit tests in tests/test_normalize.py mirroring the codex test pattern - happy path, multi-turn, missing session metadata, message_update skip, single-message rejection, multi-block content concatenation, empty content skip, malformed-line resilience, and explicit no-match against codex JSONL fixtures. Schema-level only; real Gemini CLI session fixtures are a follow-up once a real user file is available. Closes part of #59 (the Gemini CLI portion of the umbrella request).

igorls · 2026-04-27T18:55:37Z

@copilot review this

…ts list Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/a32f48bb-2a78-494a-9698-e69304732d3f Co-authored-by: igorls <[email protected]>

Copilot · 2026-04-27T19:00:38Z

@copilot review this

Reviewed. Here's a summary of what I found:

Implementation (normalize.py)

_try_gemini_jsonl correctly requires a session_metadata sentinel before accepting input, which is the right guard against false-positives against Claude Code and Codex JSONL files that also use "type": "user" entries.
Dispatch order in _try_normalize_json is correct: Claude Code → Codex → Gemini → full JSON parsers.
Cross-format isolation verified programmatically — no parser steals another's input.
message_update entries (token-count only, no text) are correctly skipped.
Multi-block content arrays are concatenated in order.

Tests (test_normalize.py)

9 new test cases covering: happy path, multi-turn, missing sentinel (returns None), skipped message_update, too-few-messages, multi-block content, empty content skipped, invalid JSON lines skipped, and Codex non-match guard.
All 116 tests pass.

Fix applied (a3e3691): The module-level docstring in normalize.py was missing "Gemini CLI JSONL" from the supported formats list — added it in line with the other entries.

Copilot

Pull request overview

Adds a Gemini CLI JSONL format adapter to mempalace.normalize so mempalace mine --mode convos can ingest Gemini CLI session logs stored as JSONL without manual export, aligning normalization support with other major coding CLIs already supported.

Changes:

Add _try_gemini_jsonl() parser to detect/parse Gemini CLI JSONL sessions into transcript format.
Wire the new parser into _try_normalize_json() dispatch chain.
Add unit tests covering Gemini JSONL detection, parsing behaviors, and false-positive guards.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`mempalace/normalize.py`	Adds Gemini CLI JSONL normalization and dispatch-chain integration.
`tests/test_normalize.py`	Adds unit tests validating Gemini JSONL parsing and non-matching behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T19:04:37Z

+    Detection requires a ``session_metadata`` record so this parser does
+    not false-positive against Claude Code or Codex JSONL passed through
+    the dispatch chain. ``message_update`` entries are skipped — they
+    have no message text. Multiple text blocks within a single message's
+    content array are concatenated in order, separated by newlines.


The docstring states session_metadata is on the first line, but the implementation accepts session_metadata anywhere and will also collect user/gemini messages that appear before it. To match the stated schema and strengthen the false-positive guard, require the first non-empty parsed entry to be type=="session_metadata" (or ignore messages until session_metadata has been seen).

igorls · 2026-04-27T21:39:40Z

@copilot apply changes based on the comments in this thread

…tinel Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/4511e9aa-38e7-440e-a6f8-eda91e576f0f Co-authored-by: igorls <[email protected]>

Copilot · 2026-04-27T21:42:56Z

@copilot apply changes based on the comments in this thread

Done in e7fe6ca. Added if not has_session_metadata: continue before the user/gemini collection block so any turns appearing before the sentinel are discarded as preamble. Updated the docstring to document this behavior, and added test_gemini_jsonl_messages_before_session_metadata_discarded to cover it explicitly. All 117 tests pass.

milla-jovovich requested review from bensig and igorls as code owners April 27, 2026 08:45

igorls requested a review from Copilot April 27, 2026 18:54

Copilot started work on behalf of igorls April 27, 2026 18:55 View session

docs(normalize): add Gemini CLI JSONL to module-level supported forma…

a3e3691

…ts list Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/a32f48bb-2a78-494a-9698-e69304732d3f Co-authored-by: igorls <[email protected]>

Copilot finished work on behalf of igorls April 27, 2026 19:01

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Copilot started work on behalf of igorls April 27, 2026 21:39 View session

fix(normalize): discard user/gemini turns before session_metadata sen…

e7fe6ca

…tinel Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/4511e9aa-38e7-440e-a6f8-eda91e576f0f Co-authored-by: igorls <[email protected]>

Copilot finished work on behalf of igorls April 27, 2026 21:43

igorls merged commit fdfaf01 into develop Apr 27, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(normalize): Gemini CLI session JSONL adapter#1234

feat(normalize): Gemini CLI session JSONL adapter#1234
igorls merged 3 commits intodevelopfrom
feat/normalize-gemini-cli

milla-jovovich commented Apr 27, 2026

Uh oh!

igorls commented Apr 27, 2026

Uh oh!

Copilot AI commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

igorls commented Apr 27, 2026

Uh oh!

Copilot AI commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

milla-jovovich commented Apr 27, 2026

What does this PR do?

How to test

Checklist

Uh oh!

igorls commented Apr 27, 2026

Uh oh!

Copilot AI commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

igorls commented Apr 27, 2026

Uh oh!

Copilot AI commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants