-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add druid.coordinator.kill.maxInterval property. #17680
Conversation
This property allows interval-based limiting of the number of segments killed, as well as the already-existing segment-count-based limiting. This is useful because the kill task retrieves payloads of *all* active segments that fall within the kill interval, which can be a large number. The new default is 30 days, which seems reasonable for common situations.
Release notes because the 30 day default (if it sticks) is a behavior change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. It makes sense to limit the interval of a single kill
task. It would help both with lock contention as well as reducing the number of used segment load specs fetched by each kill task.
As follow up, we would need some more improvements as there are still some possible issues:
- user can configure a large
maxKillInterval
- the number of segments in the
maxKillInterval
can still be large (a single time chunk is allowed to have upto ~32k segments) - segments in the
maxKillInterval
can have large payloads
We can mitigate these problems by making the kill
task not fetch the used segment load specs at all. It could look like this:
- Coordinator duty fetches segment payloads along with eligible intervals from the metadata store. This would not increase the duty time since we are already limiting the number of segments fetched.
- Coordinator already has all used segments in memory. It can easily identify the unused segment IDs which share a load spec with an existing used segment.
- The unused segment IDs which do not share a load spec with any used segment should then be passed over to the
kill
task. kill
task need not fetch the used segment load specs from the Overlord. It can just filter out the IDs as provided.
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gianm , could you please add a section for release notes in the PR description itself? |
I have added some text to the PR description. |
This property allows interval-based limiting of the number of segments killed, as well as the already-existing segment-count-based limiting. This is useful because the kill task retrieves payloads of all active segments that fall within the kill interval, which can be a large number.
The new default is 30 days, which seems reasonable for common situations.
Release notes text:
Automatic kill tasks limited to 30 days by default
The Coordinator can optionally issue kill tasks for cleaning up unused segments. Starting with this release, individual kill tasks are by default limited to processing 30 days or fewer worth of segments per task. This improves performance of the individual kill tasks.
The previous behavior (no limit on interval per kill task) can be restored by setting
druid.coordinator.kill.maxInterval = P0D
.