Skip to content

Latest commit

 

History

History

12_Observability

Monitoring Services & Tools

Service Type Open Source Description
New Relic Observability, Alerts No Monitor, debug, and improve your entire stack.
PagerDuty IncidentResponse No Incident Response & On-Call Management
AWS Cloudwatch Observability, Alerts No Observe and monitor AWS resources and applications in the cloud and on premises.
Prometheus Observability, Alerts Yes Power your metrics and alerting with the leading open-source monitoring solution.
Grafana Visualization Yes Compose and scale observability with one or all pieces of the stack.
ELK Observability Yes Logs Aggregation, Monitoring etc.
Uptime Robot Alerts No Uptime monitoring service.
Splunk Observability, Alerts No Splunk is a log aggregator while New Relic is a web application performance profiler.
SumoLogic Observability, Alerts No Cloud-native SaaS analytics.
Source Code profiling Profiling No
Nagios Observability Yes
Falcon LogScale Observability No

Terminologies

Term Description
MTTD (Mean time to Detection) MTTD is the amount of time, on average, between the start of an issue and when teams become aware of it. This does not include time spent troubleshooting or fixing the issue.
MTTR (Mean time to Resolution) MTTR is the average amount of time between when an issue is detected, and when systems are fixed and operating normally again.
- Ideally this includes both time spent fixing the issue, and implementing proactive steps to prevent it from happening again.
Telemetry Data MELT - Metrics, Events, Logs, and Traces.

References