Skip to content

fix: correct context_type classification for memories during reindex#1061

Open
muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
muxiaomu001:fix/reindex-context-type-classification
Open

fix: correct context_type classification for memories during reindex#1061
muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
muxiaomu001:fix/reindex-context-type-classification

Conversation

@muxiaomu001
Copy link
Copy Markdown

Summary

Fixes #1060

When reindexing via /api/v1/content/reindex, all memories under viking://user/*/memories/* and viking://agent/*/memories/* are incorrectly classified as context_type="resource" instead of "memory". This causes downstream consumers (e.g. OpenClaw's auto-recall plugin) that filter by context_type == "memory" to miss all reindexed memories.

Root Cause

Two bugs in the reindex code path:

Bug 1: summarizer.py — wrong URI prefix check

# Before (never matches — no URI uses this path)
if uri.startswith("viking://memory/"):

# After (matches actual memory URIs like viking://user/default/memories/...)
if "/memories" in uri:

The fix uses the same substring matching logic as core/directories.get_context_type_for_uri().

Bug 2: semantic_dag.py — root context_type propagated to all children

SemanticDagExecutor stored the root URI's context_type as self._context_type and used it for every child node during recursive traversal. When reindexing from viking:// (root), the root has no /memories/ segment, so context_type defaults to "resource" and propagates to all descendants.

Fixed by calling get_context_type_for_uri() per node instead of inheriting from root.

Why this was hidden

Normal memory creation via auto-capture uses a completely different code path (memory_updater.py → embedding queue, which hardcodes "memory"). It never touches Summarizer or SemanticDagExecutor. This bug only manifests during reindex operations — typically after switching embedding models.

Changes

  • openviking/utils/summarizer.py: Fix URI matching to use "/memories" in uri and "/skills" in uri
  • openviking/storage/queuefs/semantic_dag.py: Use get_context_type_for_uri() per node instead of self._context_type

Test Plan

  1. Populate memories via normal auto-capture flow
  2. Run POST /api/v1/content/reindex with {"uri": "viking://", "regenerate": true}
  3. Verify via GET /api/v1/debug/vector/scroll that records under /memories/ have context_type="memory"
  4. Verify OpenClaw auto-recall (which filters by context_type=="memory") can find reindexed memories

When reindexing via `/api/v1/content/reindex`, all memories under
`viking://user/*/memories/*` and `viking://agent/*/memories/*` were
incorrectly classified as `context_type="resource"` instead of
`"memory"`. This caused downstream consumers (e.g. OpenClaw auto-recall)
that filter by `context_type=="memory"` to miss all reindexed memories.

Two root causes:

1. `summarizer.py` checked `uri.startswith("viking://memory/")` which
   never matches real memory URIs (actual paths use `/memories/` as a
   path segment, not a top-level scheme). Fixed by using substring match
   `"/memories" in uri`, consistent with `core/directories.get_context_type_for_uri()`.

2. `semantic_dag.py` propagated the root URI's `context_type` to all
   child nodes during recursive traversal. When reindexing from
   `viking://`, the root has no `/memories/` segment, so all children
   inherited `"resource"`. Fixed by calling `get_context_type_for_uri()`
   per node.

Normal memory creation via auto-capture is unaffected (it uses
`memory_updater.py` which hardcodes `"memory"`), so this bug only
manifests during reindex operations.

Closes volcengine#1060
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Bug: summarizer.py context_type misclassifies memories as resources during reindex

3 participants