Skip to content

[ENH] Make roll dirty log always converge to coalesce everything. #4927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 25, 2025

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented Jun 24, 2025

Description of changes

The dirty log was not guaranteed to converge even in the absence of
further writes. I confirmed this empirically with local testing and
inspection of the dirty log. Post-patch it always converges within one
rollup.

Test plan

CI

  • Tests pass locally with pytest for python, yarn test for js, cargo nextest for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor

propel-code-bot bot commented Jun 24, 2025

Make Dirty Log Rollup Deterministically Converge and Refactor Log Compaction Path

This PR overhauls the dirty log rollup and compaction mechanics in the Rust log service, making dirty log coalescence deterministic: after a single rollup, the dirty log always converges regardless of further write activity. The changes extend to multi-shard rollup concurrency, cache management for coalesced cursor witnesses, test improvements, and several minor refactors and parameter tunings.

Key Changes:
• Redesigns dirty log rollup and compaction to ensure convergence after a single rollup, regardless of further incoming markers
• Refactors DirtyMarker::coalesce_markers API to operate on external mutable rollups and forget sets, improving testability and clarity of state mutation
• Ensures consistent detection and removal of purged collections from rollups across concurrent streams during rollup
• Extends Rollup, RollupPerCollection, and transient rollup processing to handle concurrent dirty marker reduction (buffered parallelism)
• Improves cache integration for cursor witness objects, correctly serializing/deserializing witness state to accelerate rollup and compaction checks
• Fixes and parameterizes related tests and utilities (Python property-based add test for minimal add, log offsets test for nonintermittent CI behavior)
• Bumps relevant operational tuning parameters (e.g., increases compactor concurrency and I/O queue limits to reduce compaction latency)
• Updates the CursorStore and Witness types in wal3 for easier, explicit witness/cursor access
• Miscellaneous documentation and error handling improvements

Affected Areas:
• rust/log-service/src/lib.rs (core rollup/convergence logic)
• rust/wal3/src/cursors.rs (witness/cursor struct refactor)
• rust/wal3/src/writer.rs (concurrency/gc for log writer)
• chromadb/test/property/test_add.py (property-based test tuning)
• rust/log/tests/log-offsets.rs (integration test deflake)
• rust/worker/tilt_config.yaml (operational param tuning)
• rust/log-service/src/bin/chroma-inspect-dirty-log.rs (debug tool buffer size)
• rust/log/src/grpc_log.rs (minor error string fix)

This summary was automatically generated by @propel-code-bot

{
selected_rollups.push((*collection_id, *rollup));
}
}
}
// Then allocate the collection ID strings outside the lock.
let mut all_collection_info = Vec::with_capacity(selected_rollups.len());
for (collection_id, rollup) in selected_rollups {
for (collection_id, rollup) in selected_rollups.into_iter().take(100) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic constant. Why do we take the first k (and in what order)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in by mistake.

@rescrv rescrv force-pushed the rescrv/roll-dirty-log branch from e6ca1a1 to 27b7292 Compare June 24, 2025 22:32
Comment on lines +1090 to +1096
let mut forget = HashSet::default();
DirtyMarker::coalesce_markers(&records, &mut rollup.rollups, &mut forget)?;
rollup.forget.extend(forget);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: why not coalesce_markers(&records, &mut rollup.rollups, &mut rollup.forget) or maybe coalesce_marksers(&records, &mut rollup)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget needs to be maintained across futures. Will document.

@rescrv rescrv force-pushed the rescrv/roll-dirty-log branch 3 times, most recently from b414912 to 8695b78 Compare June 25, 2025 15:10
@rescrv rescrv force-pushed the rescrv/roll-dirty-log branch from 8695b78 to 4306673 Compare June 25, 2025 17:14
rescrv added 2 commits June 25, 2025 10:31
otherwise it gets compacted quicker than you can see it
@rescrv rescrv merged commit 236b62c into main Jun 25, 2025
57 checks passed
@rescrv rescrv deleted the rescrv/roll-dirty-log branch June 25, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants