fix: correct context_type classification for memories during reindex#1061
Open
muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
Open
fix: correct context_type classification for memories during reindex#1061muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
muxiaomu001 wants to merge 1 commit intovolcengine:mainfrom
Conversation
When reindexing via `/api/v1/content/reindex`, all memories under
`viking://user/*/memories/*` and `viking://agent/*/memories/*` were
incorrectly classified as `context_type="resource"` instead of
`"memory"`. This caused downstream consumers (e.g. OpenClaw auto-recall)
that filter by `context_type=="memory"` to miss all reindexed memories.
Two root causes:
1. `summarizer.py` checked `uri.startswith("viking://memory/")` which
never matches real memory URIs (actual paths use `/memories/` as a
path segment, not a top-level scheme). Fixed by using substring match
`"/memories" in uri`, consistent with `core/directories.get_context_type_for_uri()`.
2. `semantic_dag.py` propagated the root URI's `context_type` to all
child nodes during recursive traversal. When reindexing from
`viking://`, the root has no `/memories/` segment, so all children
inherited `"resource"`. Fixed by calling `get_context_type_for_uri()`
per node.
Normal memory creation via auto-capture is unaffected (it uses
`memory_updater.py` which hardcodes `"memory"`), so this bug only
manifests during reindex operations.
Closes volcengine#1060
|
|
|
Failed to generate code suggestions for PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1060
When reindexing via
/api/v1/content/reindex, all memories underviking://user/*/memories/*andviking://agent/*/memories/*are incorrectly classified ascontext_type="resource"instead of"memory". This causes downstream consumers (e.g. OpenClaw's auto-recall plugin) that filter bycontext_type == "memory"to miss all reindexed memories.Root Cause
Two bugs in the reindex code path:
Bug 1:
summarizer.py— wrong URI prefix checkThe fix uses the same substring matching logic as
core/directories.get_context_type_for_uri().Bug 2:
semantic_dag.py— root context_type propagated to all childrenSemanticDagExecutorstored the root URI'scontext_typeasself._context_typeand used it for every child node during recursive traversal. When reindexing fromviking://(root), the root has no/memories/segment, socontext_typedefaults to"resource"and propagates to all descendants.Fixed by calling
get_context_type_for_uri()per node instead of inheriting from root.Why this was hidden
Normal memory creation via
auto-captureuses a completely different code path (memory_updater.py→ embedding queue, which hardcodes"memory"). It never touchesSummarizerorSemanticDagExecutor. This bug only manifests duringreindexoperations — typically after switching embedding models.Changes
openviking/utils/summarizer.py: Fix URI matching to use"/memories" in uriand"/skills" in uriopenviking/storage/queuefs/semantic_dag.py: Useget_context_type_for_uri()per node instead ofself._context_typeTest Plan
POST /api/v1/content/reindexwith{"uri": "viking://", "regenerate": true}GET /api/v1/debug/vector/scrollthat records under/memories/havecontext_type="memory"context_type=="memory") can find reindexed memories