Skip to content

RATIS-998: shouldWithholdVotes() should be triggered for handling higher term #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ public CompletableFuture<RaftClientReply> setConfigurationAsync(SetConfiguration
}

private boolean shouldWithholdVotes(long candidateTerm) {
if (state.getCurrentTerm() < candidateTerm) {
if (state.getCurrentTerm() >= candidateTerm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind the change? It seems to contradict the raft paper.

Copy link
Contributor Author

@GlenGeng-awx GlenGeng-awx Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does obey the paper.

4.2.3 Disruptive servers
Without additional mechanism, servers not in Cnew can disrupt the cluster. Once the cluster leader has created the Cnew entry, a server that is not in Cnew will no longer receive heartbeats, so it will time out and start new elections. Furthermore, it will not receive the Cnew entry or learn of that entry’s commitment, so it will not know that it has been removed from the cluster. The server will send RequestVote RPCs with new term numbers, and this will cause the current leader to revert to follower state. A new leader from Cnew will eventually be elected, but the disruptive server will time out again and the process will repeat, resulting in poor availability. If multiple servers have been removed from the cluster, the situation could degrade further.

Because of this scenario, we now believe that no solution based on comparing logs alone (such as the Pre-Vote check) will be sufficient to tell if an election will be disruptive.

Raft’s solution uses heartbeats to determine when a valid leader exists. In Raft, a leader is
considered active if it is able to maintain heartbeats to its followers (otherwise, another server will start an election). Thus, servers should not be able to disrupt a leader whose cluster is receiving heartbeats. We modify the RequestVote RPC to achieve this: if a server receives a RequestVote request within the minimum election timeout of hearing from a current leader, it does not update its term or grant its vote. It can either drop the request, reply with a vote denial, or delay the request; the result is essentially the same. This does not affect normal elections, where each server waits at least a minimum election timeout before starting an election. However, it helps avoid disruptions from servers not in Cnew: while a leader is able to get heartbeats to its cluster, it will not be deposed by larger term numbers.

BTW, shouldWithholdVotes is redundant now, the cases handled by shouldWithholdVotes() can also handled by recognizeCandidate(). if currentTerm is larger than candidateTerm, just reject the RequestVote request, no further handling is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If (state.getCurrentTerm() >= candidateTerm), the server should not vote it. Why returning false?

Honestly, I don't understand what you are trying to fix. Could you add a unit test to show it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key point here is, voting should only depends on voterFor and currentTerm, whose correctness is proved formally

截屏2020-08-08 下午1 59 32

Voting that depends on role and requestTime/responseTime is not a theoretically proved way.

return false;
} else if (isLeader()) {
return true;
Expand Down