feat: add import-memories command for markdown chat exports by danielaustralia1 · Pull Request #149 · Martian-Engineering/lossless-claw

danielaustralia1 · 2026-03-21T05:11:10Z

Summary

Adds a new lcm-tui import-memories <path> CLI command that bulk-imports markdown conversation files from external AI chat platforms (ChatGPT, Claude, OpenClaw, etc.) into the LCM database. This enables users to bring their historical AI conversations into LCM's memory system, making them searchable via lcm_grep, inspectable via lcm_describe, and expandable via lcm_expand_query.

Motivation

Users accumulate valuable conversation history across multiple AI platforms. Currently, LCM only ingests conversations that happen within OpenClaw sessions. This feature bridges that gap by allowing users to import their existing chat exports — organized as markdown files in folders — directly into LCM's database where they become part of the agent's long-term memory.

What it does

New CLI command

# Dry-run: see what would be imported (no database needed)
lcm-tui import-memories ~/memories

# Import for real
lcm-tui import-memories ~/memories --apply

# Force re-import + optional compaction
lcm-tui import-memories ~/memories --apply --force --compact --model claude-opus-4

Directory structure mapping

memories/
├── chatgpt/           → agent="chatgpt"
│   ├── debugging.md   → title="debugging", 1 conversation
│   └── refactoring.md → title="refactoring", 1 conversation
├── claude/            → agent="claude"
│   ├── 2026-02-18 - API access.md → title="2026-02-18 - API access"
│   └── 2026-02-17 - Setup.md      → title="2026-02-17 - Setup"
└── openclaw/          → agent="openclaw"
    └── ...

Each .md file → one conversation in the database
Parent folder name → agent identifier (overridable with --agent)
Filename (without extension) → conversation title

Markdown turn parser

Supports 5 marker styles detected automatically per file:

Style	Example
h4 heading	`#### You:` / `#### ChatGPT`
h2 heading	`## User` / `## Assistant`
h3 heading	`### Human` / `### Assistant`
Bold	`User:` / `Assistant:`
Inline	`Human:` / `Assistant:`

Role mapping: User/Human/You → user, Assistant/AI/ChatGPT/Claude → assistant, System → system

Fallback: Files without recognized markers are imported as a single assistant message (useful for knowledge documents).

Timestamp extraction (3-tier fallback)

<time> HTML tags — ChatGPT exports include <time datetime="2025-11-19T05:05:22.247Z"> per message. These are parsed for per-message precision and stripped from content.
Filename date — Claude exports use 2026-02-18 - Title.md naming. The YYYY-MM-DD prefix is extracted as the conversation date, with messages spaced 1 minute apart.
Import time — Falls back to current UTC time if no date source is available.

Additional features

YAML frontmatter stripping — ChatGPT exports include --- delimited frontmatter (title, source URL). This is stripped before parsing so it doesn't pollute message content.
UTF-8 BOM handling — Windows-exported files with BOM prefix are handled correctly.
Code block awareness — Turn markers inside fenced code blocks (```) are ignored, preventing false splits.
Deduplication — Deterministic session IDs from file paths. Re-running the command skips already-imported files.
Force re-import — --force flag deletes existing conversation data and re-imports from the source file.
Optional compaction — --compact flag runs the existing backfill compaction engine on each imported conversation to build the summary DAG immediately.
Dry-run by default — Shows what would be imported without touching the database. Works even without a database file.

CLI flags

Flags:
  --apply              actually import (default: dry-run)
  --force              re-import files even if already imported
  --agent <name>       override agent name (default: parent folder name)
  --db <path>          database path (default: ~/.openclaw/lcm.db)
  --compact            run compaction after importing each conversation
  --provider <id>      API provider for compaction
  --model <id>         API model for compaction
  --leaf-chunk-tokens  max input tokens per leaf chunk (default 20000)
  --leaf-target-tokens target output tokens for leaf summaries (default 1200)
  --condensed-target-tokens target for condensed summaries (default 2000)
  --leaf-fanout        minimum leaf summaries before condensation (default 8)
  --condensed-fanout   minimum summaries before d2+ condensation (default 4)
  --hard-fanout        minimum summaries in forced root fold (default 2)
  --fresh-tail         freshest raw messages to preserve (default 32)
  --prompt-dir         custom prompt template directory

Implementation details

Files changed

File	Change	Lines
`tui/import-memories.go`	New	638
`tui/import_memories_test.go`	New	819
`tui/main.go`	Modified	+7

Architecture

The implementation reuses the existing backfill infrastructure:

applyBackfillImport() — conversation creation, message insertion, FTS indexing, message part creation
inspectBackfillImportPlan() — deduplication checks
runBackfillCompaction() — optional post-import summary DAG construction
backfillMessage struct — message representation
backfillSessionInput struct — import input packaging

New code is limited to:

Directory walking and .md file discovery
Markdown turn marker detection and parsing
Timestamp extraction from <time> tags and filenames
YAML frontmatter stripping
Conversation data deletion for --force re-import
Agent name derivation from folder structure

Database integration

Imported conversations use the same schema as native backfill:

conversations table with session_id = "import:<sha256-hash>" for deterministic dedup
messages + context_items + message_parts tables for full message storage
messages_fts for full-text search indexing

Test plan

44 tests covering:

Parser tests (20)

Timestamp tests (7)

<time datetime="..."> tag extraction with nanosecond precision
<time> tag stripped from message content
Filename date extraction (YYYY-MM-DD - Title.md)
Fallback: no <time> tags → uses filename date
Fallback: no date anywhere → uses current time
extractTimeTag() with valid, missing, and malformed tags
extractFilenameDate() with dated and undated filenames

Frontmatter & encoding tests (5)

YAML frontmatter stripped from content
Frontmatter with BOM prefix
Unclosed frontmatter left intact
UTF-8 BOM stripped — first marker matches correctly
ChatGPT full format: frontmatter + h1 title + #### You: markers + <time> tags

Infrastructure tests (12)

memorySessionID determinism and prefix
deriveAgentName: nested folders, root-level files, deep nesting
discoverMarkdownFiles: finds .md, ignores .txt, case-insensitive extension
parseImportMemoriesArgs: valid path, --apply mode, missing path, non-directory, non-existent
deleteConversationData: clears all related tables (messages, context_items, message_parts, FTS)
Force re-import: delete + re-import end-to-end
End-to-end: discover → parse → import → verify DB → idempotency check
FTS indexing: rows created, searchable, cleaned up on delete
Dry-run makes no database writes

Tested against real data

Successfully dry-run tested against 824 real markdown files (7,120 messages) across multiple folders:

ChatGPT exports with <time> tags and YAML frontmatter
Claude exports with YYYY-MM-DD filename dates
Multiple agent folders with varied conversation lengths (1–80 messages per file)
Zero parsing warnings, zero skipped files

🤖 Generated with Claude Code

Add a new `lcm-tui import-memories` CLI command that imports markdown conversation files from external AI chat exports (ChatGPT, Claude, etc.) into the LCM database as searchable conversations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add import-memories command for markdown chat exports#149

feat: add import-memories command for markdown chat exports#149
danielaustralia1 wants to merge 1 commit intoMartian-Engineering:mainfrom
danielaustralia1:feat/import-memories

danielaustralia1 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielaustralia1 commented Mar 21, 2026

Summary

Motivation

What it does

New CLI command

Directory structure mapping

Markdown turn parser

Timestamp extraction (3-tier fallback)

Additional features

CLI flags

Implementation details

Files changed

Architecture

Database integration

Test plan

Parser tests (20)

Timestamp tests (7)

Frontmatter & encoding tests (5)

Infrastructure tests (12)

Tested against real data

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant