-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator can't stop in time, because of background jobs are still running #5274
Comments
Thanks, @in-han , I think the coordinator running too long does not affect the follower who became a leader, the follower should watch the leader key is expired. could you show more details logs? |
Yes, follower can watch key expiration. But follower can campaign leader with tow conditions: a) pd leader key is expired; b) follower is etcd leader, in this case, the old pd leader was still the etcd leader. |
Why lease expired but the old PD still is the leader? does it re-election and the old pd became leader again? BTW, 2000 TiKVs is the largest cluster size I have seen. amazing! here is an issue trace to improve the performance of the |
Pd leader lease timeout because it encountered timeout when write key to embed ETCD. You are right, in this case the old pd became leader again. Ah, after deployment, we also feel that this cluster is too large! I think there maybe two ways to resolve this problem : |
@nolouch Thanks for your reply |
/remove-type bug |
Bug Report
I have posted a post on the forum, see https://asktug.com/t/topic/694191/3 .
In short, the problem is that when coordinator is stopping the scheduler will still keep running until all jobs are finished.
The following figure shows that it takes a long time for the coordinator to close from the beginning to the end, in this case more than 10 minutes.

I think the root cause is that schedulers and other background jobs have no way to receive the signal to exit when coordinator is stopping.
What did you do?
pd leader lease timeout.

What did you expect to see?
Follower campaign for leader in time.
What did you see instead?
Follower can't campaign for leader in 10 minutes, and keep print the following log.

What version of PD are you using (
pd-server -V
)?5.3.0
The text was updated successfully, but these errors were encountered: