You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pd-2.log:[2024/01/28 11:13:33.947 +08:00] [INFO] [raft.go:706] ["fef5444e2c4d3d9c became follower at term 4"]
pd-1.log:[2024/01/28 11:13:33.953 +08:00] [INFO] [raft.go:771] ["a22604cca51ee334 became leader at term 4"]
The etcd leader of pd-2 dropped at 11:13:33.947, and then the etcd leader was elected by pd-1 as the new etcd leader. However, it was not until 11:18:30.646 that the PD leader of pd-2 stepped down, and pd-1 was elected as the new PD leader.
pd-2.log:[2024/01/28 11:18:30.646 +08:00] [INFO] [server.go:1687] ["etcd leader changed, resigns pd leadership"] [old-pd-leader-name=tc-pd-2]
pd-1.log:[2024/01/28 11:18:31.627 +08:00] [INFO] [server.go:1529] ["pd leader has changed, try to re-campaign a pd leader"]
The only check for the PD leader to find out the etcd leader is changed:
So it is possible that the raw raft node has already finished the election but the upper etcd server can not get the latest soft state to apply, which causes the new etcd leader to be set in a long time later. This could happen more easily, especially when we have some chaos like IO hang injected into the etcd.
The text was updated successfully, but these errors were encountered:
Enhancement Task
The etcd leader of pd-2 dropped at 11:13:33.947, and then the etcd leader was elected by pd-1 as the new etcd leader. However, it was not until 11:18:30.646 that the PD leader of pd-2 stepped down, and pd-1 was elected as the new PD leader.
The only check for the PD leader to find out the etcd leader is changed:
pd/server/server.go
Lines 1805 to 1809 in 1c54865
So it's reasonable to conclude that
m.etcd.Server.Lead()
may not return the latest etcd leader as soon as possible.In the etcd code, there is a small detail inside the raft
Ready
preparation:https://github.com/etcd-io/etcd/blob/85b640cee793e25f3837c47200089d14a8392dc7/raft/node.go#L311-L322
So it is possible that the raw raft node has already finished the election but the upper etcd server can not get the latest soft state to apply, which causes the new etcd leader to be set in a long time later. This could happen more easily, especially when we have some chaos like IO hang injected into the etcd.
The text was updated successfully, but these errors were encountered: