fix(reindex): use get_context_type_for_uri for correct memory/skill classification#1071
Closed
evaldass wants to merge 1 commit intovolcengine:mainfrom
Closed
fix(reindex): use get_context_type_for_uri for correct memory/skill classification#1071evaldass wants to merge 1 commit intovolcengine:mainfrom
evaldass wants to merge 1 commit intovolcengine:mainfrom
Conversation
…lassification
During reindex, memories under viking://user/memories/ and
viking://agent/*/memories/ were misclassified as 'resource' due to two
bugs:
1. summarizer.py used uri.startswith('viking://memory/') which never
matches real memory URIs (viking://user/memories/... or
viking://agent/<id>/memories/...). Replaced with the canonical
get_context_type_for_uri() from core/directories.py.
2. semantic_dag.py propagated the root URI's context_type to all child
nodes via self._context_type. When reindexing from viking:// (root),
this defaulted to 'resource' for everything. Now each node determines
its own context_type via get_context_type_for_uri().
Both fixes reuse the existing get_context_type_for_uri() which already
handles all URI patterns correctly.
Fixes volcengine#1060
|
Failed to generate code suggestions for PR |
Contributor
Author
|
Closing as duplicate — PR #1061 by @muxiaomu001 already addresses the same two bugs (summarizer.py + semantic_dag.py context_type misclassification). Both PRs use the same approach (get_context_type_for_uri). Apologies for the overlap! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During
reindex(via/api/v1/content/reindex), all memory files underviking://user/memories/andviking://agent/*/memories/are incorrectly tagged withcontext_type = "resource"instead of"memory".This causes downstream consumers (e.g. OpenClaw's auto-recall plugin) that filter by
context_type == "memory"to miss all reindexed memories.Root Cause
Two bugs in the reindex code path:
Bug 1:
summarizer.py(line 61)Real memory URIs look like
viking://user/memories/entities/mem_xxx.mdorviking://agent/<id>/memories/cases/mem_xxx.md— none start withviking://memory/.Bug 2:
semantic_dag.py(lines 443, 563)SemanticDagExecutorreceivescontext_typefrom the root URI and propagates it to all child nodes viaself._context_type. When reindexing fromviking://(root), the root has no/memoriessubstring, socontext_typedefaults to"resource"for everything — including actual memories.Fix
Both files now use the existing
get_context_type_for_uri()fromcore/directories.py, which already handles all URI patterns correctly:This is consistent with how
DirectoryInitializer(line 269 indirectories.py) already determines context types.Changes
summarizer.pystartswithchecks withget_context_type_for_uri(uri)semantic_dag.pyself._context_typewith per-nodeget_context_type_for_uri(file_path/dir_uri)Impact
memory_updater.py)Fixes #1060