-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH] Batch get info from sysdb for compaction scheduler #4926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ENH] Batch get info from sysdb for compaction scheduler #4926
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Refactor Compaction Scheduler to Batch Fetch from sysdb & Improve Offset Handling This PR refactors the compaction scheduler's collection verification and enrichment to perform batch retrievals from sysdb instead of one-by-one queries, significantly improving efficiency. It also replaces scheduler panics on broken log offsets with error logging and skips such collections from compaction, enhancing service robustness and reliability. Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
Note: the payload of the batch response might be too large. consider setting a limit for the batch size |
9ff7037
to
ad7dd6b
Compare
92d1a07
to
95d7a36
Compare
95d7a36
to
16c9146
Compare
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()), | ||
..Default::default() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
When we know the collection is empty and need to avoid panicking in the first_log_offset
case, consider setting a limit for the batch size as suggested in the PR comments. Adding a batch size limit would prevent potential payload size issues when fetching collections with many records.
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()), | |
..Default::default() | |
}) | |
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()), | |
limit: Some(1000), // Add reasonable batch limit to prevent oversized payloads | |
..Default::default() |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Description of changes
Summarize the changes made by this PR.
verify_and_enrich
function in compaction scheduler so that it will get information from sysdb in batch (vs one by one previously)Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?