fix: correct context_type classification for memories during reindex by muxiaomu001 · Pull Request #1061 · volcengine/OpenViking

muxiaomu001 · 2026-03-28T16:17:08Z

Summary

When reindexing via /api/v1/content/reindex, all memories under viking://user/*/memories/* and viking://agent/*/memories/* are incorrectly classified as context_type="resource" instead of "memory". This causes downstream consumers (e.g. OpenClaw's auto-recall plugin) that filter by context_type == "memory" to miss all reindexed memories.

Root Cause

Two bugs in the reindex code path:

Bug 1: `summarizer.py` — wrong URI prefix check

# Before (never matches — no URI uses this path)
if uri.startswith("viking://memory/"):

# After (matches actual memory URIs like viking://user/default/memories/...)
if "/memories" in uri:

The fix uses the same substring matching logic as core/directories.get_context_type_for_uri().

Bug 2: `semantic_dag.py` — root context_type propagated to all children

SemanticDagExecutor stored the root URI's context_type as self._context_type and used it for every child node during recursive traversal. When reindexing from viking:// (root), the root has no /memories/ segment, so context_type defaults to "resource" and propagates to all descendants.

Fixed by calling get_context_type_for_uri() per node instead of inheriting from root.

Why this was hidden

Normal memory creation via auto-capture uses a completely different code path (memory_updater.py → embedding queue, which hardcodes "memory"). It never touches Summarizer or SemanticDagExecutor. This bug only manifests during reindex operations — typically after switching embedding models.

Changes

openviking/utils/summarizer.py: Fix URI matching to use "/memories" in uri and "/skills" in uri
openviking/storage/queuefs/semantic_dag.py: Use get_context_type_for_uri() per node instead of self._context_type

Test Plan

Populate memories via normal auto-capture flow
Run POST /api/v1/content/reindex with {"uri": "viking://", "regenerate": true}
Verify via GET /api/v1/debug/vector/scroll that records under /memories/ have context_type="memory"
Verify OpenClaw auto-recall (which filters by context_type=="memory") can find reindexed memories

When reindexing via `/api/v1/content/reindex`, all memories under `viking://user/*/memories/*` and `viking://agent/*/memories/*` were incorrectly classified as `context_type="resource"` instead of `"memory"`. This caused downstream consumers (e.g. OpenClaw auto-recall) that filter by `context_type=="memory"` to miss all reindexed memories. Two root causes: 1. `summarizer.py` checked `uri.startswith("viking://memory/")` which never matches real memory URIs (actual paths use `/memories/` as a path segment, not a top-level scheme). Fixed by using substring match `"/memories" in uri`, consistent with `core/directories.get_context_type_for_uri()`. 2. `semantic_dag.py` propagated the root URI's `context_type` to all child nodes during recursive traversal. When reindexing from `viking://`, the root has no `/memories/` segment, so all children inherited `"resource"`. Fixed by calling `get_context_type_for_uri()` per node. Normal memory creation via auto-capture is unaffected (it uses `memory_updater.py` which hardcodes `"memory"`), so this bug only manifests during reindex operations. Closes volcengine#1060

CLAassistant · 2026-03-28T16:17:15Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2026-03-28T16:18:02Z

Failed to generate code suggestions for PR

github-project-automation bot added this to OpenViking project Mar 28, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 28, 2026

evaldass mentioned this pull request Mar 29, 2026

fix(reindex): use get_context_type_for_uri for correct memory/skill classification #1071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct context_type classification for memories during reindex#1061

fix: correct context_type classification for memories during reindex#1061
muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
muxiaomu001:fix/reindex-context-type-classification

muxiaomu001 commented Mar 28, 2026

Uh oh!

CLAassistant commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

muxiaomu001 commented Mar 28, 2026

Summary

Root Cause

Bug 1: summarizer.py — wrong URI prefix check

Bug 2: semantic_dag.py — root context_type propagated to all children

Why this was hidden

Changes

Test Plan

Uh oh!

CLAassistant commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug 1: `summarizer.py` — wrong URI prefix check

Bug 2: `semantic_dag.py` — root context_type propagated to all children