-
Notifications
You must be signed in to change notification settings - Fork 19
1.4 Monitor plugins
Health Monitor has a plugin based system of monitors allowing to monitor endpoints via different protocols.
In order to use the given monitor type, it's package has to be installed in monitors directory located in directory where HealthMonitoring.Monitors.SelfHost.exe is.
It is possible to check what monitor types are currently registered in Health Monitor by calling GET /api/monitors API operation.
Below there is a list of supported monitor types:
| Monitor type | Package |
|---|---|
| http | HealthMonitoring.Monitors.Http-deploy |
| http.json | HealthMonitoring.Monitors.Http-deploy |
| nsb3 | HealthMonitoring.Monitors.Nsb3-deploy |
| nsb5.msmq | HealthMonitoring.Monitors.Nsb5.Msmq-deploy |
| nsb5.rabbitmq | HealthMonitoring.Monitors.Nsb5.Rabbitmq-deploy |
| push | N/A, see HealthMonitoring.Integration.PushClient |
- Monitor Package: HealthMonitoring.Monitors.Http-deploy
- Monitor types: http; http-json;
- Monitored address format: URL (http or https)
- Integration needed on monitored endpoint side: none for http, minimal for http.json
Monitor allowing to check health over HTTP protocol.
Registration examples:
{
"Name": "Google",
"Address": "http://google.com",
"MonitorType": "http",
"Group": "External"
}{
"Name": "My api",
"Address": "http://myapiurl/health", // this is an example URL and it can be anything
"MonitorType": "http.json",
"Group": "Internal"
}The monitor will periodically call the URL specified in endpoint address to retrieve it's health status. The response status code would be mapped to health status in following way:
- 404 Not Found =>
NotExists - 503 Service Unavailable =>
Offline - 200 OK =>
Healthy - anything else =>
Faulty
Please check 1.1 Endpoint monitoring to see the additional conditions when TimedOut, Faulty and Unhealthy statuses are specified.
The difference between http and http.json monitor type is that:
- http does not analyse response body and does not return any endpoint details,
- http.json expects response body to contain a dictionary of key-value string pairs, which would be captured as endpoint details.
Example expected response body for http.json:
{
"property one": "some details",
"some other property": "some other details"
}- Monitor Package: HealthMonitoring.Monitors.Nsb3-deploy
- Monitor types: nsb3
- Monitored address format: MSMQ queue address understandable by NServiceBus 3, usually in
queuename@hostformat - Integration needed on monitored endpoint side: endpoint has to handle GetStatusRequest message from HealthMonitoring.Monitors.Nsb3.Messages package.
Monitor allowing to check health over NServiceBus 3 MSMQ.
Registration examples:
{
"Name": "My service",
"Address": "my_queue@some_host",
"MonitorType": "nsb3",
"Group": "My group"
}The monitor will periodically send GetStatusRequest from HealthMonitoring.Monitors.Nsb3.Messages package to the queue specified in endpoint address, and it will expect a GetStatusResponse message being replied.
If the response is received, the endpoint status would be Healthy / Unhealthy (depending on response time).
If the timeout occurs before response is received, endpoint will end up in Faulty state - please check 1.1 Endpoint monitoring to see details about timeouts.
If GetStatusResponse contains any details about the endpoint, they would be captured as well.
The GetStatusRequest has a TimeToBeReceived specified which means that message will be discarded after that time if not received, so the monitored endpoint queue will not grow if endpoint is down.
The TimeToBeReceived is set by default to 30 seconds but it can be customized in Monitor process app config (example below changes it to 1 minute):
<appSettings>
<add key="Monitor.Nsb3.MessageTimeout" value="00:01:00"/>
...
</appSettings>Please note also that this setting is used by monitor itself to wait for the response. The endpoint health status would be Faulty if first timeout happens (Monitor.Nsb3.MessageTimeout or FailureTimeOut described in 1.1 Endpoint monitoring).
To use this monitor, the monitored service has to handle GetStatusRequest message and reply GetStatusResponse. Both messages are defined in HealthMonitoring.Monitors.Nsb3.Messages package.
Below there is an example status handler:
public class StatusHandler : IHandleMessages<GetStatusRequest>
{
private readonly IBus _bus;
public StatusHandler(IBus bus)
{
_bus = bus;
}
public void Handle(GetStatusRequest message)
{
var details = new Dictionary<string, string> { { "Machine", Environment.MachineName }, { "Version", GetType().Assembly.GetName().Version.ToString(4) } };
_bus.Reply(new GetStatusResponse { RequestId = message.RequestId, Details = details });
}
}- Monitor Package: HealthMonitoring.Monitors.Nsb5.Msmq-deploy
- Monitor types: nsb5.msmq
- Monitored address format: MSMQ queue address understandable by NServiceBus 5, usually in
queuename@hostformat - Integration needed on monitored endpoint side: endpoint has to handle GetStatusRequest message from HealthMonitoring.Monitors.Nsb5.Messages package.
Monitor allowing to check health over NServiceBus 5 MSMQ.
Registration examples:
{
"Name": "My service",
"Address": "my_queue@some_host",
"MonitorType": "nsb5.msmq",
"Group": "My group"
}The monitor behaves in exactly the same way as HealthMonitoring.Monitors.Nsb3, with the exception that:
- It supports NServiceBus 5 and MSMQ transport protocol
- Monitor type is nsb5.msmq
- Message contract package is HealthMonitoring.Monitors.Nsb5.Messages
- Message timeout can be customized with Monitor.Nsb5.Msmq.MessageTimeout app setting key
- Monitor Package: HealthMonitoring.Monitors.Nsb5.Rabbitmq-deploy
- Monitor types: nsb5.rabbitmq;
- Monitored address format: Rabbitmg queue address understandable by NServiceBus 5, usually in
queuenameformat - Integration needed on monitored endpoint side: endpoint has to handle GetStatusRequest message from HealthMonitoring.Monitors.Nsb5.Messages package.
Monitor allowing to check health over NServiceBus 5 Rabbitmq.
Registration examples:
{
"Name": "My service",
"Address": "my_queue",
"MonitorType": "nsb5.rabbitmq",
"Group": "My group"
}The monitor behaves in exactly the same way as HealthMonitoring.Monitors.Nsb3, with the exception that:
- It supports NServiceBus 5 and RabbitMq transport protocol
- It requires RabbitMqConnectionString connection string (see below)
- Monitor type is nsb5.rabbitmq
- Message contract package is HealthMonitoring.Monitors.Nsb5.Messages
- Message timeout can be customized with Monitor.Nsb5.Rabbitmq.MessageTimeout app setting key
- TimeToBeReceived works slightly differently than on MSMQ
In order to use this monitor, a Rabbitmq connection string has to be specified in Monitor process app config file:
<connectionStrings>
<add name="RabbitMqConnectionString" connectionString="host=localhost;username=guest;password=guest" />
...
</connectionStrings>- Monitor Package: N/A - always available since version 3.4.0.0
- Client Package: HealthMonitoring.Integration.PushClient
- Monitor types: push;
- Monitored address format: Any, PushClient will use
host:endpoint_name - Integration needed on monitored endpoint side: a push monitor integration is needed by using HealthMonitoring.Integration.PushClient package.
This integration method inverts the monitoring process in a way that monitored service pushes it's health periodically to the Health Monitor as an oppositions to other monitor types which are doing polling.
The integration with push mode consists of 2 steps:
- implementation of monitoring logic by extending the AbstractHealthChecker class,
- starting the monitoring loop in the service process that should be monitored by using HealthMonitorPushClient class.
The example service using push integration can be found in the: HealthMonitoring.Examples.ServiceWithPushIntegration sample program.
The class implementing AbstractHealthChecker is responsible for providing health status of the endpoint.
The most trivial implementation could look like follows:
class HealthChecker : AbstractHealthChecker
{
protected override Task<HealthStatus> OnHealthCheckAsync(Dictionary<string, string> details, CancellationToken cancellationToken)
{
details.Add("custom details", "all good!");
return Task.FromResult(HealthStatus.Healthy);
}
}The OnHealthCheckAsync() method has details dictionary parameter that could be used to pass more detailed information about health - it would be uploaded to API and available on endpoint details page. By default, AbstractHealthChecker adds following predefined details:
- Version - version of the assembly extending AbstractHealthChecker,
- Host - host name where endpoint is running,
- Location - system path to the entry assembly executing the endpoint.
If method requires to perform checks that would take more time, it should be implemented in the asynchronous manner and make an use of cancelationToken parameter to cancel the health check operation if requested.
The method should return the health status of the endpoint, where supported statuses are:
- HealthStatus.Healthy - The target endpoint is fully operational,
- HealthStatus.Faulty - The target endpoint is broken and it is not able to function properly,
- HealthStatus.Unhealthy - The target endpoint is operational but has performance or other minor issues,
- HealthStatus.Offline - The target endpoint exists, but is not actively serving requests (it is offline / put into maintenance etc).
The monitoring loop should start on endpoint startup. The HealthMonitorPushClient class should be used to define the endpoint details as well as start the monitoring of the endpoint.
Below, there is an example of the monitoring loop initialization:
IDisposable notifier = HealthMonitorPushClient
.UsingHealthMonitor("http://localhost:9000/")
.DefineEndpoint(builder => builder
.DefineGroup("Examples")
.DefineName("Service With Push Integration")
.DefineTags("example")
.DefineAddress("ServiceWithPushIntegration_node1")
.DefinePassword("12345678"))
.WithHealthCheck(new HealthChecker())
.WithBackOffStategy(new CustomBackOffStrategy()) // optional method-chain call
.StartHealthNotifier();The UsingHealthMonitor() method is used to specify the Health Monitor url.
The DefineEndpoint() method is used to specify endpoint details like group, name, tags, address and password, where:
- DefineAddress() method is used to define endpoint unique address in format
host:endpointUniqueName, wherehostpart can be specified explicitly or inferred (there are 2 overloads of this method). TheendpointUniqueNameshould be unique for given machine. It could refer to the windows service Id or some other unique identifier. Please note that hard coding the address is possible, however it will not allow to run the same endpoint many times on one machine. Probably the better approach would be to read this value from app.config and make deployment script to adjust it to the windows service id; - DefinePassword() requires an password to be specified (min 8 characters) that would be used in the endpoint registration and later to update endpoint health;
- DefineTags() is optional.
The WithHealthCheck() method is used to specify the instance responsible for providing endpoint health, described above.
The WithBackOffStategy() method is optional. It is used to inject the instance responsible for providing a Back-off Strategy on app error behavior. You can use the RecommendedBackOffStrategy helper class to enforce preferred behaviour by overriding its individual methods and/or extending the class as needed. Alternatively, you can choose to create a new bespoke Back-Off Strategy class and inject the instance instead.
Finally, the StartHealthNotifier() starts the monitoring loop on separate thread and returns IDisposable instance of the notifier, that could be disposed on endpoint shutdown.
Internally, the monitoring loop works in following way:
- It calls HealthMonitor to retrieve Monitor.HealthCheckInterval setting and uses it as an interval between health checks. The setting is fetched every 10 minutes in case it has changed;
- It registers the endpoint in Health Monitor, using provided definition. If endpoint is deleted during the monitoring loop, it would be recreated on the next health update;
- After each interval, it calls provided implementation to obtain the current health status of the endpoint and then uploads it to Health Monitor;
- If monitoring loop fail on communication with Health Monitor, it would be retrying indefinitely, until connection would be restored or endpoint shut down. Each retry time would be extended, up to 2 minutes between retries.