Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add druid.coordinator.kill.maxInterval property. #17680

Merged
merged 4 commits into from
Jan 30, 2025

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jan 29, 2025

This property allows interval-based limiting of the number of segments killed, as well as the already-existing segment-count-based limiting. This is useful because the kill task retrieves payloads of all active segments that fall within the kill interval, which can be a large number.

The new default is 30 days, which seems reasonable for common situations.

Release notes text:

Automatic kill tasks limited to 30 days by default

The Coordinator can optionally issue kill tasks for cleaning up unused segments. Starting with this release, individual kill tasks are by default limited to processing 30 days or fewer worth of segments per task. This improves performance of the individual kill tasks.

The previous behavior (no limit on interval per kill task) can be restored by setting druid.coordinator.kill.maxInterval = P0D.

This property allows interval-based limiting of the number of segments
killed, as well as the already-existing segment-count-based limiting.
This is useful because the kill task retrieves payloads of *all* active
segments that fall within the kill interval, which can be a large number.

The new default is 30 days, which seems reasonable for common situations.
@gianm
Copy link
Contributor Author

gianm commented Jan 29, 2025

Release notes because the 30 day default (if it sticks) is a behavior change.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. It makes sense to limit the interval of a single kill task. It would help both with lock contention as well as reducing the number of used segment load specs fetched by each kill task.


As follow up, we would need some more improvements as there are still some possible issues:

  • user can configure a large maxKillInterval
  • the number of segments in the maxKillInterval can still be large (a single time chunk is allowed to have upto ~32k segments)
  • segments in the maxKillInterval can have large payloads

We can mitigate these problems by making the kill task not fetch the used segment load specs at all. It could look like this:

  • Coordinator duty fetches segment payloads along with eligible intervals from the metadata store. This would not increase the duty time since we are already limiting the number of segments fetched.
  • Coordinator already has all used segments in memory. It can easily identify the unused segment IDs which share a load spec with an existing used segment.
  • The unused segment IDs which do not share a load spec with any used segment should then be passed over to the kill task.
  • kill task need not fetch the used segment load specs from the Overlord. It can just filter out the IDs as provided.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kfaraz kfaraz merged commit 100d9ef into apache:master Jan 30, 2025
79 checks passed
@kfaraz
Copy link
Contributor

kfaraz commented Jan 30, 2025

@gianm , could you please add a section for release notes in the PR description itself?

@gianm gianm deleted the kill-interval-limit branch January 30, 2025 16:52
@gianm
Copy link
Contributor Author

gianm commented Jan 30, 2025

@gianm , could you please add a section for release notes in the PR description itself?

I have added some text to the PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants