Skip to content

CASSANDRA-20664: Fix endless loop on reading commitlogs when replay e… #4207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

hsahu-ksolves123
Copy link

@hsahu-ksolves123 hsahu-ksolves123 commented Jun 23, 2025

This PR addresses CASSANDRA-20664, which caused an endless loop during commit log replay when an error was encountered. The issue was due to improper error handling in CommitLogReadHandler.

Fix Summary:
1.Fixed loop logic in CommitLogReadHandler.java by properly handling replay errors to avoid infinite loops on unreadable entries.

2.Updated CommitLogReader.java to support graceful skipping of problematic logs and integrate the fix.

3.Added a unit test to reproduce and verify the bug scenario and ensure it’s resolved.

4.Verified manually by building from source and running both new and existing CommitLog-related tests.

5.Pushed changes to a feature branch (CASSANDRA-20664-fix) with clean commits, ready for review and backporting after merge.

@hsahu-ksolves123
Copy link
Author

Hi, this PR is ready for review. Please let me know if any changes or improvements are needed. Thanks!

@netudima
Copy link
Contributor

@hsahu-ksolves123 thank you for the PR.
Please find a set of comments below:

// 🔴 Fix: If unsuccessful, skip to next without retrying
if (!success) {
logger.info("File {} was not processed successfully. Skipping to next.", file.name());
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the idea of this continue logic at the end of the loop logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. The idea behind the continue here is to ensure that if a particular commit log file cannot be processed (e.g., due to corruption or unreadable format), we skip that file and move to the next one instead of retrying or entering an infinite loop.

Without this continue, the system might retry the same unreadable file endlessly, which was the original issue described in CASSANDRA-20664. The goal is to fail fast for that file and proceed with the rest of the logs safely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I did not get: continue statement is a kind of goto to the next iteration of the for loop ("for (File file : filteredLogs)" in our case) and we have "continue" here as a last statement of the loop code, so in any case we go to the next iteration of the loop after it, it does not matter if we have "continue" or not. Am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — since the continue is the last statement in the loop block, it’s redundant and doesn’t affect the flow. It can be safely removed. Thanks for pointing that out — I’ll update the code accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @netudima,
I've pushed the latest changes:
1.Removed the redundant continue as discussed
2.Added a unit test to reproduce and validate the fix
3.Verified build and test locally

Please review and let me know if any further updates are needed. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I do not see the changes in the PR diff

Copy link
Author

@hsahu-ksolves123 hsahu-ksolves123 Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent code changes are now correctly reflected in this pull request. You can now see the updates in both CommitLogReader.java and CommitLogReaderTest.java. Kindly review the latest commits and let me know if any further changes or improvements are needed.

Thanks

@hsahu-ksolves123 hsahu-ksolves123 force-pushed the CASSANDRA-20664-fix branch 3 times, most recently from 7c7ef9d to 3c0ef8a Compare June 25, 2025 18:33
@belliottsmith belliottsmith force-pushed the trunk branch 2 times, most recently from df3eb40 to 54e39a9 Compare July 23, 2025 11:19
List<File> filteredLogs = filterCommitLogFiles(files);
int i = 0;
for (File file: filteredLogs)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please return brace on a new line as it was before

i++;
readCommitLogSegment(handler, file, minPosition, ALL_MUTATIONS, i == filteredLogs.size());
boolean success = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you need success for? If you went without having success, then you would do logger.info call in catch block.

Also the way it is coded right now is that handler.handleError in catch block might throw RuntimeException you are not catching nor acting upon anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants