Skip to content

[ENH]: More concurrent blockfilewriter #4889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 26, 2025

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Jun 19, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Blockfilewriter was effectively single-threaded since it took a big fat mutex in the beginning of operations (set/delete/get_owned)
    • This PR makes it more performant through a combination of Optimistic Concurrency Control and Pessimistic Concurrency Control
    • The fat mutex is replaced by a partitioned mutex. The writer retries if any state changes concurrently. The entire set of Sparse index operations (splits and conversion to delta) are applied atomically now. The other details of the algorithm are deferred to reading of code
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link
Contributor Author

sanketkedia commented Jun 19, 2025

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@sanketkedia sanketkedia marked this pull request as ready for review June 19, 2025 05:29
@sanketkedia sanketkedia changed the title [ENH]: More concurrent blockfile [ENH]: More concurrent blockfilewriter Jun 19, 2025
Copy link
Contributor

propel-code-bot bot commented Jun 19, 2025

Enable High-Concurrency for Arrow BlockfileWriter with Partitioned Mutexes

This PR overhauls concurrency control in the Arrow blockfile writer, replacing a single global mutex that serialized all write operations with a partitioned per-block locking scheme (using AsyncPartitionedMutex) and a hybrid of Optimistic and Pessimistic Concurrency Control. Major API changes are applied to set, delete, and get_owned operations to enable parallelism per block, with retries and atomic operations to avoid race conditions and ensure the correctness of block splits and sparse index updates. The SparseIndexWriter gains an 'apply_updates' method for atomic multi-block updates, and concurrency testing receives configuration improvements.

Key Changes:
• Replaces a single global (fat) write mutex in ArrowUnorderedBlockfileWriter with a partitioned async mutex keyed by block ID for finer-grained concurrent access.
• Rewrites set, delete, and get_owned methods to use lock-per-block and implement OCC/PCC: operation retries if concurrent index changes are detected.
• Introduces atomic multi-block sparse index updates via SparseIndexWriter::apply_updates, ensuring safe block splits and block replacements.
• Updates concurrency and unit tests to improve test coverage and confirm high-concurrency scenarios, including more aggressive block splitting.
• Refactors imports and test helpers to support the new locking primitives.

Affected Areas:
• rust/blockstore/src/arrow/blockfile.rs
• rust/blockstore/src/arrow/sparse_index.rs
• rust/blockstore/src/arrow/concurrency_test.rs

This summary was automatically generated by @propel-code-bot

@@ -11,6 +11,7 @@ use crate::arrow::root::CURRENT_VERSION;
use crate::arrow::sparse_index::SparseIndexWriter;
use crate::key::CompositeKey;
use crate::key::KeyWrapper;
use chroma_cache::AysncPartitionedMutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

There's a typo in the import name: AysncPartitionedMutex should be AsyncPartitionedMutex (missing the first 'n' in Async). This is present in several places in your code.

@@ -31,7 +32,7 @@ pub struct ArrowUnorderedBlockfileWriter {
block_deltas: Arc<Mutex<HashMap<Uuid, UnorderedBlockDelta>>>,
root: RootWriter,
id: Uuid,
write_mutex: Arc<tokio::sync::Mutex<()>>,
deltas_mutex: Arc<AysncPartitionedMutex<Uuid>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Great work implementing the partitioned mutex approach! This will significantly improve concurrency by allowing multiple operations on different blocks to proceed in parallel. One minor suggestion: consider adding a brief comment explaining the OCC/PCC approach near where the mutex is declared to help future developers understand the implementation.

@sanketkedia sanketkedia force-pushed the 06-18-_enh_more_concurrent_blockfile branch from 01938f5 to c872b26 Compare June 24, 2025 05:56
@sanketkedia sanketkedia merged commit 8240d5c into main Jun 26, 2025
113 of 115 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants