Skip to content

[ENH] Batch get info from sysdb for compaction scheduler #4926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Sicheng-Pan
Copy link
Contributor

@Sicheng-Pan Sicheng-Pan commented Jun 24, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Refactors verify_and_enrich function in compaction scheduler so that it will get information from sysdb in batch (vs one by one previously)
    • The compaction scheduler no longer panics on collections with broken offsets. Instead, an error trace will be emitted and the collection will be ignored for compaction
  • New functionality
    • N/A

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

Sicheng-Pan commented Jun 24, 2025

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review June 24, 2025 17:46
Copy link
Contributor

propel-code-bot bot commented Jun 24, 2025

Refactor Compaction Scheduler to Batch Fetch from sysdb & Improve Offset Handling

This PR refactors the compaction scheduler's collection verification and enrichment to perform batch retrievals from sysdb instead of one-by-one queries, significantly improving efficiency. It also replaces scheduler panics on broken log offsets with error logging and skips such collections from compaction, enhancing service robustness and reliability.

Key Changes:
• verify_and_enrich_collections now fetches collection and tenant info in batches from sysdb using HashMaps for mapping.
• Broken offset scenarios no longer trigger panics; impacted collections are ignored with error-level tracing.
• Unit test updated: replaced #[should_panic]-based panic check with a test that asserts no jobs are scheduled for broken offsets.
• Minor improvements to tracing and error handling throughout the scheduler.

Affected Areas:
• rust/worker/src/compactor/scheduler.rs (core logic and related tests)

This summary was automatically generated by @propel-code-bot

@Sicheng-Pan
Copy link
Contributor Author

Note: the payload of the batch response might be too large. consider setting a limit for the batch size

@Sicheng-Pan Sicheng-Pan changed the base branch from sicheng/06-23-_enh_purge_dirty_log_in_background_at_the_end_of_scheduled_compaction to graphite-base/4926 June 24, 2025 21:23
@Sicheng-Pan Sicheng-Pan force-pushed the sicheng/06-24-_enh_batch_get_info_from_sysdb_for_compaction_scheduler branch from 92d1a07 to 95d7a36 Compare June 24, 2025 21:24
@graphite-app graphite-app bot changed the base branch from graphite-base/4926 to main June 24, 2025 21:24
@Sicheng-Pan Sicheng-Pan force-pushed the sicheng/06-24-_enh_batch_get_info_from_sysdb_for_compaction_scheduler branch from 95d7a36 to 16c9146 Compare June 24, 2025 21:24
Comment on lines +124 to +126
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()),
..Default::default()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

When we know the collection is empty and need to avoid panicking in the first_log_offset case, consider setting a limit for the batch size as suggested in the PR comments. Adding a batch size limit would prevent potential payload size issues when fetching collections with many records.

Suggested change
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()),
..Default::default()
})
collection_ids: Some(collection_id_to_log_info.keys().cloned().collect()),
limit: Some(1000), // Add reasonable batch limit to prevent oversized payloads
..Default::default()

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant