Releases: m-lab/prometheus-support
Add base recording rules for NDT early warning
This release adds recording rules for the ndt early warning dashboards.
This release updates the version of Alertmanager to hyperlink URLs in alert annotations.
This release updates the github receiver which now leaves issues open after resolution requiring manually closing them.
Prometheus 1.8 and Grafana 4.6.2 upgrades
This release of prometheus-support includes minor version upgrades to the Prometheus and Grafana servers, as well as an update of the bigquery-exporter to v0.3.
This release also includes multiple bigquery exporter query updates: ipv6 bais, ndt server metrics, and ndt test counts (which was running in mlab-oti as-hoc).
Resource changes:
- the blackbox exporter CPU alloc is now 1 CPU to resolve suspected overload.
Alert changes:
- ParserDailyVolumeTooLow added to track the pipeline daily volume.
- DownloaderDownOrMissing added to report if the downloader is not running at all. Coincides with the production deployment of downloader.
- ScraperMostRecentArchivedFileTimeIsTooOld will now fire only after 56 hours (instead of 36) to allow for rsyncd config updates. This is to reduce redundancy with the ParserDailyVolumeTooLow alert and reduce the frequency of this scraper alert, which is currently our most common one, which has a reputation for auto-closing without doing anything else.
Lameduck alerts, BigQuery Exporter, and Experiment monitoring
This release includes several significant enhancements to the production prometheus configuration.
- nodes that are in lame-duck mode are excluded from alerts.
- the bigquery exporter is now available for production dashboards
- the blackbox exporter supports health checks used by mlab-ns today
- the gcp-service-discovery side-car service for prometheus, downloads all the new target files including for the inotify exporter.
Templating and snmp_exporter improvements.
This release includes several refinements and improvements to the new templating system. It also incorporate a number of incremental changes to the snmp_exporter configurations.
Prometheus now monitors the snmp_exporter service itself, and will also trigger an alert when it notices it has been down for at least 10 minutes.
Unify k8s configuration across GCP projects
This release marks the consolidation of per-project k8s configurations into a single k8s config that supports per-project templates variables.
This should make it much easier to track changes from sandbox to staging to production and easier to manage the k8s configurations.
As well this change updates the nagios exporter port from 5000 to the registered port 9267.
Use OAuth2 for Grafana login authentication
This release includes and update to Grafana v4.5.2. With this change we are also using OAuth login exclusively.
This release includes alert changes from #88
This release includes support for snmp-exporter.
Update resource limits for mlab-staging & mlab-oti
Merge pull request #72 from stephen-soltesz/master Add notes for upgrading resource limits
Alertmanager Github Receiver
Includes the v0.0 github receiver running in mlab-oti.
Initial Release
This is the first version to support autodeploy to mlab-oti from tags.