Skip to content

fix(reindex): use get_context_type_for_uri for correct memory/skill classification#1071

Closed
evaldass wants to merge 1 commit intovolcengine:mainfrom
evaldass:fix/reindex-context-type-misclassification
Closed

fix(reindex): use get_context_type_for_uri for correct memory/skill classification#1071
evaldass wants to merge 1 commit intovolcengine:mainfrom
evaldass:fix/reindex-context-type-misclassification

Conversation

@evaldass
Copy link
Copy Markdown
Contributor

Problem

During reindex (via /api/v1/content/reindex), all memory files under viking://user/memories/ and viking://agent/*/memories/ are incorrectly tagged with context_type = "resource" instead of "memory".

This causes downstream consumers (e.g. OpenClaw's auto-recall plugin) that filter by context_type == "memory" to miss all reindexed memories.

Root Cause

Two bugs in the reindex code path:

Bug 1: summarizer.py (line 61)

# Broken — 'viking://memory/' never matches real URIs
if uri.startswith("viking://memory/"):
    context_type = "memory"

Real memory URIs look like viking://user/memories/entities/mem_xxx.md or viking://agent/<id>/memories/cases/mem_xxx.md — none start with viking://memory/.

Bug 2: semantic_dag.py (lines 443, 563)

SemanticDagExecutor receives context_type from the root URI and propagates it to all child nodes via self._context_type. When reindexing from viking:// (root), the root has no /memories substring, so context_type defaults to "resource" for everything — including actual memories.

Fix

Both files now use the existing get_context_type_for_uri() from core/directories.py, which already handles all URI patterns correctly:

def get_context_type_for_uri(uri: str) -> str:
    if "/memories" in uri:
        return "memory"
    elif "/resources" in uri:
        return "resource"
    elif "/skills" in uri:
        return "skill"
    elif uri.startswith("viking://session"):
        return "memory"
    return "resource"

This is consistent with how DirectoryInitializer (line 269 in directories.py) already determines context types.

Changes

File Change
summarizer.py Replace broken startswith checks with get_context_type_for_uri(uri)
semantic_dag.py Replace self._context_type with per-node get_context_type_for_uri(file_path/dir_uri)

Impact

  • Auto-capture memories are unaffected (different code path via memory_updater.py)
  • Reindex operations will now correctly classify memories, skills, and resources
  • No breaking changes — reindex simply becomes correct

Fixes #1060

…lassification

During reindex, memories under viking://user/memories/ and
viking://agent/*/memories/ were misclassified as 'resource' due to two
bugs:

1. summarizer.py used uri.startswith('viking://memory/') which never
   matches real memory URIs (viking://user/memories/... or
   viking://agent/<id>/memories/...). Replaced with the canonical
   get_context_type_for_uri() from core/directories.py.

2. semantic_dag.py propagated the root URI's context_type to all child
   nodes via self._context_type. When reindexing from viking:// (root),
   this defaulted to 'resource' for everything. Now each node determines
   its own context_type via get_context_type_for_uri().

Both fixes reuse the existing get_context_type_for_uri() which already
handles all URI patterns correctly.

Fixes volcengine#1060
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 29, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@evaldass
Copy link
Copy Markdown
Contributor Author

Closing as duplicate — PR #1061 by @muxiaomu001 already addresses the same two bugs (summarizer.py + semantic_dag.py context_type misclassification). Both PRs use the same approach (get_context_type_for_uri). Apologies for the overlap!

@evaldass evaldass closed this Mar 29, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Bug: summarizer.py context_type misclassifies memories as resources during reindex

2 participants