forked from saltstack/salt
-
Notifications
You must be signed in to change notification settings - Fork 0
SSE health monitoring feature
Alan Cugler edited this page Oct 23, 2019
·
2 revisions
SSE needs a comprehensive portal for effectively showing the status of minions in the cluster in increasing degrees of detail.
This will describe steps of increased detail added in over possibly multiple releases of Saltstack's SSE product.
- This exits to some degree already. A health page showing the minions status which will be color coded with a time stamp of last checked.
- For minions not in good standing there needs to be a drop down option in the GUI to show a one liner error for why that minion is not in good standing. Example: "minion is not responding, proxy minion pillar credentials are returning with permission denied, minion keys accepted but can not communicate, is port 4505 & 4506 open? etc"
- The minions with the same error have a color code indicator to show they are grouped together for the same error. This is important for large cluster management. If I have 400 minion failures and 95% of them are the same error and a few are uniquely failing, you should be able to quickly identify that.
- There should be statistics added in as well for what you've selected. such as showing a percentage of failed minions having the same error as well as hard numbers of exact number of minions failing from this error.
- Return the minion and master logs respectively depending on the error. These should be a new job and not always being copied to SSE, if you are always syncing logs it will bog down the cluster but on request in the health report section is valuable to get more detail quickly.
Author: Alan Cugler Verbally discussed by Paul Bailey, Gary Richmond, and Alan Cugler
Notes:
- was