Commit 0c3bd0a
authored
Dev: sbd: Improve the process of leveraging maintenance mode (#1950)
## Problem
#1744 leverage maintenance mode
when needs to restart cluster, but there are still some problems when
resources are running:
#### Configuration changed before hinting, might lead to inconsistent
```
# crm sbd configure watchdog-timeout=45
INFO: No 'msgwait-timeout=' specified in the command, use 2*watchdog timeout: 90
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda5
INFO: Update SBD_WATCHDOG_DEV in /etc/sysconfig/sbd: /dev/watchdog0
INFO: Sync file /etc/sysconfig/sbd to sle16-2
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 131
INFO: Sync file /etc/sysconfig/sbd to sle16-2
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 119, it was 71
INFO: Sync directory /etc/systemd/system/sbd.service.d to sle16-2
WARNING: Resource is running, need to restart cluster service manually on each node
WARNING: Or, run with `crm -F` or `--force` option, the `sbd` subcommand will leverage maintenance mode for any changes that require restarting sbd.service
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
# crm sbd purge
INFO: Stop sbd resource 'stonith-sbd'(stonith:fence_sbd)
INFO: Remove sbd resource 'stonith-sbd'
INFO: Disable sbd.service on node sle16-1
INFO: Disable sbd.service on node sle16-2
INFO: Move /etc/sysconfig/sbd to /etc/sysconfig/sbd.bak on all nodes
INFO: Delete cluster property "stonith-timeout" in crm_config
INFO: Delete cluster property "priority-fencing-delay" in crm_config
WARNING: "stonith-enabled" in crm_config is set to false, it was true
WARNING: Resource is running, need to restart cluster service manually on each node
WARNING: Or, run with `crm -F` or `--force` option, the `sbd` subcommand will leverage maintenance mode for any changes that require restarting sbd.service
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
```
#### Pacemaker fatal exit when adding diskless sbd on a running cluster
with resources running
```
# crm cluster init sbd -S -y
INFO: Loading "default" profile from /etc/crm/profiles.yml
INFO: Loading "knet-default" profile from /etc/crm/profiles.yml
INFO: Configuring diskless SBD
WARNING: Diskless SBD requires cluster with three or more nodes. If you want to use diskless SBD for 2-node cluster, should be combined with QDevice.
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 15
INFO: Update SBD_WATCHDOG_DEV in /etc/sysconfig/sbd: /dev/watchdog0
INFO: Sync file /etc/sysconfig/sbd to sle16-2
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Enable sbd.service on node sle16-1
INFO: Enable sbd.service on node sle16-2
WARNING: Resource is running, need to restart cluster service manually on each node
WARNING: Or, run with `crm -F` or `--force` option, the `sbd` subcommand will leverage maintenance mode for any changes that require restarting sbd.service
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
WARNING: "stonith-watchdog-timeout" in crm_config is set to 30, it was 0
Broadcast message from systemd-journald@sle16-1 (Thu 2025-10-23 10:54:11 CEST):
pacemaker-controld[5674]: emerg: Shutting down: stonith-watchdog-timeout configured (30) but SBD not active
Message from syslogd@sle16-1 at Oct 23 10:54:11 ...
pacemaker-controld[5674]: emerg: Shutting down: stonith-watchdog-timeout configured (30) but SBD not active
ERROR: cluster.init: Failed to run 'crm configure property stonith-watchdog-timeout=30': ERROR: Failed to run 'crm_mon -1rR': crm_mon: Connection to cluster failed: Connection refused
```
## Solution
- Drop the function `restart_cluster_if_possible`
- Introduced a new function `utils.able_to_restart_cluster` to check if
the cluster can be restarted. Call it before changing any
configurations.
- Add leverage maintenance mode in `sbd device remove` and `sbd purge`
commands
#### Add sbd via sbd stage while resource is running
```
# crm cluster init sbd -S -y
INFO: Loading "default" profile from /etc/crm/profiles.yml
INFO: Loading "knet-default" profile from /etc/crm/profiles.yml
WARNING: Please stop all running resources and try again
WARNING: Or use 'crm -F/--force' option to leverage maintenance mode
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
INFO: Aborting the configuration change attempt
INFO: Done (log saved to /var/log/crmsh/crmsh.log on sle16-1)
# Leverage maintenance mode
# crm -F cluster init sbd -S -y
INFO: Loading "default" profile from /etc/crm/profiles.yml
INFO: Loading "knet-default" profile from /etc/crm/profiles.yml
INFO: Set cluster to maintenance mode
WARNING: "maintenance-mode" in crm_config is set to true, it was false
INFO: Configuring diskless SBD
WARNING: Diskless SBD requires cluster with three or more nodes. If you want to use diskless SBD for 2-node cluster, should be combined with QDevice.
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 15
INFO: Update SBD_WATCHDOG_DEV in /etc/sysconfig/sbd: /dev/watchdog0
INFO: Sync file /etc/sysconfig/sbd to sle16-2
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Enable sbd.service on node sle16-1
INFO: Enable sbd.service on node sle16-2
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
...........
INFO: END Waiting for cluster
WARNING: "stonith-watchdog-timeout" in crm_config is set to 30, it was 0
WARNING: "stonith-enabled" in crm_config is set to true, it was false
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 41
INFO: Sync file /etc/sysconfig/sbd to sle16-2
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 71, it was 60s
INFO: Set cluster from maintenance mode to normal
INFO: Delete cluster property "maintenance-mode" in crm_config
INFO: Done (log saved to /var/log/crmsh/crmsh.log on sle16-1)
```
#### Purge sbd while resource is running
```
# crm sbd purge
WARNING: Please stop all running resources and try again
WARNING: Or use 'crm -F/--force' option to leverage maintenance mode
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
INFO: Aborting the configuration change attempt
```
#### Add device
```
# crm sbd device add /dev/sda6
INFO: Configured sbd devices: /dev/sda5
INFO: Append devices: /dev/sda6
WARNING: Please stop all running resources and try again
WARNING: Or use 'crm -F/--force' option to leverage maintenance mode
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
INFO: Aborting the configuration change attempt
```
#### Remove device
```
# crm sbd device remove /dev/sda6
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
INFO: Remove devices: /dev/sda6
WARNING: Please stop all running resources and try again
WARNING: Or use 'crm -F/--force' option to leverage maintenance mode
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
INFO: Aborting the configuration change attempt
```
#### Configure sbd while DLM is running
```
# crm sbd configure watchdog-timeout=40
INFO: No 'msgwait-timeout=' specified in the command, use 2*watchdog timeout: 80
WARNING: Please stop all running resources and try again
WARNING: Or use 'crm -F/--force' option to leverage maintenance mode
WARNING: Understand risks that running RA has no cluster protection while the cluster is in maintenance mode and restarting
INFO: Aborting the configuration change attempt
# Leverage maintenance mode
# crm -F sbd configure watchdog-timeout=40
INFO: No 'msgwait-timeout=' specified in the command, use 2*watchdog timeout: 80
INFO: Set cluster to maintenance mode
WARNING: "maintenance-mode" in crm_config is set to true, it was false
WARNING: Please stop DLM related resources (gfs2-clone) and try again
INFO: Set cluster from maintenance mode to normal
INFO: Delete cluster property "maintenance-mode" in crm_config
```File tree
7 files changed
+94
-89
lines changed- crmsh
- test
- features
- unittests
7 files changed
+94
-89
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
593 | 593 | | |
594 | 594 | | |
595 | 595 | | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | 596 | | |
613 | 597 | | |
614 | 598 | | |
| |||
746 | 730 | | |
747 | 731 | | |
748 | 732 | | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
749 | 736 | | |
750 | 737 | | |
751 | 738 | | |
| |||
760 | 747 | | |
761 | 748 | | |
762 | 749 | | |
763 | | - | |
| 750 | + | |
764 | 751 | | |
765 | 752 | | |
766 | 753 | | |
| |||
770 | 757 | | |
771 | 758 | | |
772 | 759 | | |
773 | | - | |
| 760 | + | |
774 | 761 | | |
775 | 762 | | |
776 | 763 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
520 | | - | |
521 | | - | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
522 | 525 | | |
523 | 526 | | |
524 | 527 | | |
| |||
601 | 604 | | |
602 | 605 | | |
603 | 606 | | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
604 | 619 | | |
605 | 620 | | |
606 | | - | |
607 | | - | |
608 | | - | |
| 621 | + | |
| 622 | + | |
609 | 623 | | |
610 | | - | |
611 | | - | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | 624 | | |
617 | | - | |
618 | | - | |
619 | | - | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
620 | 635 | | |
621 | 636 | | |
622 | 637 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3306 | 3306 | | |
3307 | 3307 | | |
3308 | 3308 | | |
| 3309 | + | |
| 3310 | + | |
| 3311 | + | |
| 3312 | + | |
| 3313 | + | |
| 3314 | + | |
| 3315 | + | |
| 3316 | + | |
| 3317 | + | |
| 3318 | + | |
| 3319 | + | |
| 3320 | + | |
| 3321 | + | |
| 3322 | + | |
| 3323 | + | |
| 3324 | + | |
| 3325 | + | |
| 3326 | + | |
| 3327 | + | |
| 3328 | + | |
| 3329 | + | |
| 3330 | + | |
| 3331 | + | |
| 3332 | + | |
| 3333 | + | |
| 3334 | + | |
| 3335 | + | |
| 3336 | + | |
3309 | 3337 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1627 | 1627 | | |
1628 | 1628 | | |
1629 | 1629 | | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
1630 | 1637 | | |
1631 | 1638 | | |
1632 | 1639 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | | - | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | 409 | | |
461 | 410 | | |
462 | 411 | | |
| |||
644 | 593 | | |
645 | 594 | | |
646 | 595 | | |
647 | | - | |
| 596 | + | |
648 | 597 | | |
649 | 598 | | |
650 | | - | |
| 599 | + | |
651 | 600 | | |
652 | 601 | | |
653 | 602 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
472 | | - | |
| 472 | + | |
473 | 473 | | |
474 | 474 | | |
475 | | - | |
| 475 | + | |
476 | 476 | | |
477 | 477 | | |
478 | 478 | | |
479 | | - | |
| 479 | + | |
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
| |||
571 | 571 | | |
572 | 572 | | |
573 | 573 | | |
| 574 | + | |
574 | 575 | | |
575 | 576 | | |
576 | | - | |
| 577 | + | |
577 | 578 | | |
578 | 579 | | |
579 | 580 | | |
| |||
0 commit comments