-
Notifications
You must be signed in to change notification settings - Fork 14.7k
MINOR: Fix time comparison with appendLingerMs in CoordinatorRuntime#maybeFlushCurrentBatch #20739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…maybeFlushCurrentBatch
| private void maybeFlushCurrentBatch(long currentTimeMs) { | ||
| if (currentBatch != null) { | ||
| if (currentBatch.builder.isTransactional() || (currentBatch.appendTimeMs - currentTimeMs) >= appendLingerMs || !currentBatch.builder.hasRoomFor(0)) { | ||
| if (currentBatch.builder.isTransactional() || (currentTimeMs - currentBatch.appendTimeMs) >= appendLingerMs || !currentBatch.builder.hasRoomFor(0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would prefer to remove this condition, as a time-based task already exists. This approach would also mean we are not changing the current behavior, since the condition was previously a no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned, time-based scheduled tasks will ensure that the current batch is flushed to the log once it has passed the append linger time.
Perhaps performing a conditional check here is intended to detect as promptly as possible when the current batch has passed the append linger time, so that flushing to the log can be carried out more promptly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a unit test here? Since the existing tests don't update the mock time, this bug was not disclosed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I think that we can keep the condition here. I agree with adding a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for your feedback and suggestions. I will add a unit test to cover this scenario.
|
Hello @AndrewJSchofield and @dajac , if you have time, could you please take a look at this issue? Thank you very much, and I look forward to your suggestions. |
This is much more in @dajac's area than mine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@majialoong thanks for this patch. overall LGTM.
| assertEquals(2, schedulerTimer.size()); | ||
|
|
||
| // Advance past the linger time. | ||
| clockTimer.advanceClock(11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to line#4921? It makes more sense there, since the goal of advancing the clockTimer is to ensure flushCurrentBatch is executed during writing #2, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the purpose of advancing the clockTimer is to ensure that flushCurrentBatch is executed when writing #2.
However, I think that after advancing the clockTimer, we should check the number of tasks in the schedulerTimer to ensure that the linger task still exists (has not been executed or canceled) before writing #2.
This ensures that flushCurrentBatch is executed when writing #2, rather than being triggered by the linger timer task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I will merge this tomorrow if @dajac and @AndrewJSchofield have no objections. I will also backport it to 4.1 and 4.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, @majialoong! I left a few nits. Otherwise, LGTM.
| MockPartitionWriter writer = new MockPartitionWriter(); | ||
|
|
||
| CoordinatorRuntime<MockCoordinatorShard, String> runtime = | ||
| new CoordinatorRuntime.Builder<MockCoordinatorShard, String>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We use four spaces to indent code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I’ve addressed this.
| "write#1", TP, Duration.ofMillis(20), | ||
| state -> new CoordinatorResult<>(List.of("record1"), "response1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
| "write#2", TP, Duration.ofMillis(20), | ||
| state -> new CoordinatorResult<>(List.of("record2"), "response2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
…maybeFlushCurrentBatch (#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <[email protected]>, Chia-Ping Tsai <[email protected]>
|
@majialoong could you please file a PR for 4.0? I got following error during backport |
This PR fixed the time comparison logic in
CoordinatorRuntime#maybeFlushCurrentBatchto ensure that the batch isflushed when the elapsed time since
appendTimeMsexceeds theappendLingerMsparameter.This issue is also mentioned here.
Reviewers: David Jacot [email protected], Chia-Ping Tsai
[email protected]