Skip to content

Conversation

@sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Nov 18, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • We now delete a head and its associated PL once all of its points are outdated after reassign. Previously, we'd only do it during garbage collection.
    • This should reduce the size of hnsw index, reduce peak memory usage of compactor, and make compactions faster.
    • The same_head logic during splitting is also adjusted to account for negative distances that can happen for cosine and IP
  • New functionality
    • ...

Test plan

How are these changes tested?
Ran the local repro that used to cause a stack overflow. Now it does so with very low prob. Previously it would stack overflow 100% of the time

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

None

Documentation Changes

None

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia marked this pull request as ready for review November 18, 2025 19:31
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Nov 18, 2025

Eager deletion of stale SPANN posting lists and heads

This PR teaches the SPANN index writer to drop heads and their posting lists immediately once every entry is outdated, instead of waiting for a later garbage-collection pass. A new helper (try_delete_posting_list) performs the check under the partitioned mutex, deletes the head from the HNSW graph, and removes the posting list atomically. The change is wired into split reassignment, neighbor reassignment, and merge/GC code paths so that empty centers are consistently removed, and the split logic now tolerates negative distances by comparing absolute values.

The test suite gains test_reassign_and_delete_center, which exercises reassignment across three centers and asserts that heads with only outdated points are deleted. Existing tests are updated to reflect the new head/posting-list lifecycle.

Key Changes

• Added try_delete_posting_list to detect fully outdated posting lists, delete their heads from hnsw_index, and remove their blockfile entries
• Invoked try_delete_posting_list after split-based reassignments, neighbor reassignments, and merge/GC flows so empty heads are pruned immediately
• Updated split-distance comparison to use abs() to handle negative cosine/IP distances
• Extended collect_and_reassign_split_points/collect_and_reassign_nearby_points and merge logic to account for the new eager deletion flow
• Added regression test test_reassign_and_delete_center to validate reassignment and head deletion behavior

Affected Areas

rust/index/src/spann/types.rs

This summary was automatically generated by @propel-code-bot

@blacksmith-sh

This comment has been minimized.

@sanketkedia sanketkedia force-pushed the 11-18-_enh_spann_-_delete_empty_pls branch from e121e5c to c168c2c Compare November 18, 2025 19:35
@blacksmith-sh

This comment has been minimized.

@sanketkedia sanketkedia force-pushed the 11-18-_enh_spann_-_delete_empty_pls branch from c168c2c to ff609f7 Compare December 5, 2025 16:46
@sanketkedia sanketkedia enabled auto-merge (squash) December 5, 2025 16:46
Comment on lines +782 to +786
for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) {
if self.is_outdated(*doc_offset_id, *doc_version).await? {
outdated_count += 1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended

[Performance] The loop starting on this line calls self.is_outdated for each item, which acquires a read lock on self.versions_map in every iteration. For large posting lists, this can be inefficient due to repeated lock acquisition overhead.

To optimize this, consider acquiring the read lock once before the loop and performing the checks within that single locked scope. This would reduce lock contention and improve performance.

This would involve replacing lines 781-786 with something like:

let version_map_guard = self.versions_map.read().await;
let mut outdated_count = 0;
for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) {
    let current_version = version_map_guard
        .versions_map
        .get(doc_offset_id)
        .ok_or(SpannIndexWriterError::VersionNotFound)?;
    if Self::is_deleted(*current_version) || *doc_version < *current_version {
        outdated_count += 1;
    }
}
Context for Agents
The loop starting on this line calls `self.is_outdated` for each item, which acquires a read lock on `self.versions_map` in every iteration. For large posting lists, this can be inefficient due to repeated lock acquisition overhead.

To optimize this, consider acquiring the read lock once before the loop and performing the checks within that single locked scope. This would reduce lock contention and improve performance.

This would involve replacing lines 781-786 with something like:
```rust
let version_map_guard = self.versions_map.read().await;
let mut outdated_count = 0;
for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) {
    let current_version = version_map_guard
        .versions_map
        .get(doc_offset_id)
        .ok_or(SpannIndexWriterError::VersionNotFound)?;
    if Self::is_deleted(*current_version) || *doc_version < *current_version {
        outdated_count += 1;
    }
}
```

File: rust/index/src/spann/types.rs
Line: 786

@sanketkedia sanketkedia merged commit e481836 into main Dec 5, 2025
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants