-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH]: More concurrent blockfilewriter #4889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Enable High-Concurrency for Arrow BlockfileWriter with Partitioned Mutexes This PR overhauls concurrency control in the Arrow blockfile writer, replacing a single global mutex that serialized all write operations with a partitioned per-block locking scheme (using AsyncPartitionedMutex) and a hybrid of Optimistic and Pessimistic Concurrency Control. Major API changes are applied to set, delete, and get_owned operations to enable parallelism per block, with retries and atomic operations to avoid race conditions and ensure the correctness of block splits and sparse index updates. The SparseIndexWriter gains an 'apply_updates' method for atomic multi-block updates, and concurrency testing receives configuration improvements. Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
@@ -11,6 +11,7 @@ use crate::arrow::root::CURRENT_VERSION; | |||
use crate::arrow::sparse_index::SparseIndexWriter; | |||
use crate::key::CompositeKey; | |||
use crate::key::KeyWrapper; | |||
use chroma_cache::AysncPartitionedMutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
There's a typo in the import name: AysncPartitionedMutex
should be AsyncPartitionedMutex
(missing the first 'n' in Async). This is present in several places in your code.
@@ -31,7 +32,7 @@ pub struct ArrowUnorderedBlockfileWriter { | |||
block_deltas: Arc<Mutex<HashMap<Uuid, UnorderedBlockDelta>>>, | |||
root: RootWriter, | |||
id: Uuid, | |||
write_mutex: Arc<tokio::sync::Mutex<()>>, | |||
deltas_mutex: Arc<AysncPartitionedMutex<Uuid>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Great work implementing the partitioned mutex approach! This will significantly improve concurrency by allowing multiple operations on different blocks to proceed in parallel. One minor suggestion: consider adding a brief comment explaining the OCC/PCC approach near where the mutex is declared to help future developers understand the implementation.
01938f5
to
c872b26
Compare
c872b26
to
a61d65a
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None