Skip to content

fix(memory): add exact-duplicate dedup to SemanticStore.store()#126

Closed
kagura-agent wants to merge 1 commit intoghostwright:mainfrom
kagura-agent:fix/memory-dedup-exact-duplicate
Closed

fix(memory): add exact-duplicate dedup to SemanticStore.store()#126
kagura-agent wants to merge 1 commit intoghostwright:mainfrom
kagura-agent:fix/memory-dedup-exact-duplicate

Conversation

@kagura-agent
Copy link
Copy Markdown

Problem

Closes #125.

extractFactsFromSession generates a new crypto.randomUUID() for each fact on every consolidation run. Since findContradictions() explicitly excludes same-object facts (existingObject !== newFact.object), identical-text facts accumulate as separate Qdrant points across sessions.

This leads to the system prompt's Known Facts section containing multiple copies of the same fact (e.g. four copies of "No let's not worry about being a repeat contributor...").

Fix

Add findExactDuplicate() to SemanticStore that scrolls Qdrant for an existing valid fact with the same subject + object (both keyword-indexed). When a duplicate exists, store() merges source_episode_ids into the existing fact via updatePayload() and returns early — skipping the upsert of a new point.

Changes

  • src/memory/semantic.ts: Add object keyword payload index, add findExactDuplicate() method, add dedup check in store() between contradiction resolution and upsert
  • src/memory/__tests__/semantic.test.ts: Add tests for dedup-and-merge path and different-object-creates-new-point path; update existing store test to mock the scroll endpoint

Testing

bun test src/memory/__tests__/semantic.test.ts
# 7 pass, 0 fail

Full suite: 2368 pass, 7 fail (pre-existing, same on main).

…twright#125)

Before this change, extractFactsFromSession generated a new
crypto.randomUUID() for each fact on every consolidation run.
Since findContradictions() explicitly excludes same-object facts,
identical-text facts accumulated as separate Qdrant points across
sessions.

Add findExactDuplicate() that scrolls Qdrant for an existing valid
fact with the same subject + object (both keyword-indexed). When a
duplicate exists, store() merges source_episode_ids into the
existing fact via updatePayload() and returns early — skipping the
upsert of a new point.

Also adds an object keyword payload index to enable efficient exact
match filtering.

Includes tests for both the dedup-and-merge path and the
different-object-creates-new-point path.
@kagura-agent
Copy link
Copy Markdown
Author

Closing — based on maintainer merge patterns, external PRs aren't being merged here. The fix is available in the branch if useful. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory: identical-text facts accumulate across sessions; SemanticStore has no exact-duplicate dedup on store

1 participant