Name		Name	Last commit message	Last commit date
parent directory ..
IncidentResponse		IncidentResponse
NewRelic		NewRelic
EFK.md		EFK.md
ELK.md		ELK.md
Grafana.md		Grafana.md
Jaeger.md		Jaeger.md
Prometheus.md		Prometheus.md
Readme.md		Readme.md
ServersHealthObervabilityHLD.png		ServersHealthObervabilityHLD.png
StructuredLogging.md		StructuredLogging.md

Readme.md

Monitoring Services & Tools

Service	Type	Open Source	Description
⭐ New Relic	Observability, Alerts	No	Monitor, debug, and improve your entire stack.
⭐ PagerDuty	IncidentResponse	No	Incident Response & On-Call Management
⭐ AWS Cloudwatch	Observability, Alerts	No	Observe and monitor AWS resources and applications in the cloud and on premises.
Prometheus	Observability, Alerts	Yes	Power your metrics and alerting with the leading open-source monitoring solution.
Grafana	Visualization	Yes	Compose and scale observability with one or all pieces of the stack.
ELK	Observability	Yes	Logs Aggregation, Monitoring etc.
Uptime Robot	Alerts	No	Uptime monitoring service.
Splunk	Observability, Alerts	No	Splunk is a log aggregator while New Relic is a web application performance profiler.
SumoLogic	Observability, Alerts	No	Cloud-native SaaS analytics.
Source Code profiling	Profiling	No
Nagios	Observability	Yes
Falcon LogScale	Observability	No

Term	Description
MTTD (Mean time to Detection)	MTTD is the amount of time, on average, between the start of an issue and when teams become aware of it. This does not include time spent troubleshooting or fixing the issue.
MTTR (Mean time to Resolution)	MTTR is the average amount of time between when an issue is detected, and when systems are fixed and operating normally again. - Ideally this includes both time spent fixing the issue, and implementing proactive steps to prevent it from happening again.
Telemetry Data	MELT - Metrics, Events, Logs, and Traces.