-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: SPANN - Delete empty PLs #5882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Eager deletion of stale SPANN posting lists and heads This PR teaches the SPANN index writer to drop heads and their posting lists immediately once every entry is outdated, instead of waiting for a later garbage-collection pass. A new helper ( The test suite gains Key Changes• Added Affected Areas• This summary was automatically generated by @propel-code-bot |
This comment has been minimized.
This comment has been minimized.
e121e5c to
c168c2c
Compare
This comment has been minimized.
This comment has been minimized.
c168c2c to
ff609f7
Compare
| for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) { | ||
| if self.is_outdated(*doc_offset_id, *doc_version).await? { | ||
| outdated_count += 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Performance] The loop starting on this line calls self.is_outdated for each item, which acquires a read lock on self.versions_map in every iteration. For large posting lists, this can be inefficient due to repeated lock acquisition overhead.
To optimize this, consider acquiring the read lock once before the loop and performing the checks within that single locked scope. This would reduce lock contention and improve performance.
This would involve replacing lines 781-786 with something like:
let version_map_guard = self.versions_map.read().await;
let mut outdated_count = 0;
for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) {
let current_version = version_map_guard
.versions_map
.get(doc_offset_id)
.ok_or(SpannIndexWriterError::VersionNotFound)?;
if Self::is_deleted(*current_version) || *doc_version < *current_version {
outdated_count += 1;
}
}Context for Agents
The loop starting on this line calls `self.is_outdated` for each item, which acquires a read lock on `self.versions_map` in every iteration. For large posting lists, this can be inefficient due to repeated lock acquisition overhead.
To optimize this, consider acquiring the read lock once before the loop and performing the checks within that single locked scope. This would reduce lock contention and improve performance.
This would involve replacing lines 781-786 with something like:
```rust
let version_map_guard = self.versions_map.read().await;
let mut outdated_count = 0;
for (doc_offset_id, doc_version) in doc_offset_ids.iter().zip(doc_versions.iter()) {
let current_version = version_map_guard
.versions_map
.get(doc_offset_id)
.ok_or(SpannIndexWriterError::VersionNotFound)?;
if Self::is_deleted(*current_version) || *doc_version < *current_version {
outdated_count += 1;
}
}
```
File: rust/index/src/spann/types.rs
Line: 786
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
Ran the local repro that used to cause a stack overflow. Now it does so with very low prob. Previously it would stack overflow 100% of the time
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
None
Observability plan
None
Documentation Changes
None