Skip to content

Releases: m-lab/prometheus-support

One-off bug fix release

18 Sep 17:04
0cb0fba
Compare
Choose a tag to compare

This release includes a fix to a bug introduced in the previous release which was causing floods of spurious alerts to fire for LameDuckMetricMissingForNode alert.

Weekly release: 2018-09-10 to 2018-09-18

18 Sep 16:03
db47ce2
Compare
Choose a tag to compare

This release introduces a new k8s deployment, service and ingress for the Github Maintenance Exporter.

Weekly release: 2018-08-28 to 2018-09-10

10 Sep 16:51
82fab1b
Compare
Choose a tag to compare

This release features:

  • A number of improvements to alerting, including fixes for some existing alerts to make them less noisy, plus some new alerts.
  • Updates data-processing-cluster's Prometheus instance to v2.3.2.
  • Adds a new BQ exporter query to check for completeness of NDT test annotations.

Weekly release: 2018-08-23 to 2018-08-28

28 Aug 13:54
49ea91f
Compare
Choose a tag to compare

Including a typo fix for the ParserDailyVolumeTooLow dashboard.
Increasing the timeout for the SnmpScrapingDownAtSite alert to 60m.

Weekly release: 2018-08-14 to 2018-08-23

23 Aug 18:22
0625ff0
Compare
Choose a tag to compare

This release increases the default RAM allocated to prometheus in mlab-oti and increases the cache index flag parameters to improve interactive query support.

As well, the ParserDailyVolumeTooLow alert is now built on a recording rule that should make the evaluation much more efficient.

33 new commits with various new features, improvements and bug fixes

14 Aug 18:05
80e8980
Compare
Choose a tag to compare

Overview

  • Repos prometheus-snmp-exporter and prometheus-script-exporter were renamed to snmp-exporter-support and script-exporter-support, respectively.
  • A major bug was fixed in a couple AlertMananger inhibit rules which went undiscovered due to the fact that AM was never getting reloaded to read in the new configmap which introduced the breakage. This is fixed now.
  • Grafana data sources as code! Data sources are now managed YAML files, similar to how dashboards are managed through JSON files.
  • Ad-hoc monitoring for new platform k8s cluster nodes mlab3.lga03 and mlab3.lax02.
  • Ad-hoc monitoring for the new U.S. ndt-cloud VMs.
  • ConfigMap reloader added to the AlertManager pod.
  • Automatic Google Cloud DNS entries created for properly configured k8s services.
  • New monitoring/metrics for "fast sidestream" and "tcp-info".

Client-geohash query + metric_relabel_configs for sidestream

23 Jul 20:59
8e55f1f
Compare
Choose a tag to compare

This release contains three main changes:

  • A new bq_exporter query to aggregate NDT geospatially. This will allow us to create and maybe publish heatmaps for NDT performance.

  • Label rewrite rules for legacy targets to set the experiment index label correctly. The index numbers will be used associate Sidestream aggregate traffic on a per experiment basis.

  • We now scrape node_exporters running on dns.measurementlab.net and mirror.measurementlab.net, and alert on them too.

Auto-deploy cloud VM targets + AM inhibit rules

19 Jul 21:02
50a8ba7
Compare
Choose a tag to compare

This release contains three main changes:

  • Prometheus targets for ndt-cloud GCE VMs are now stored as static files in this repository, and are deployed to k8s persistent storage automatically by Travis builds. From here forward, if ndt-cloud targets need to change, edit them in this repo.

  • A couple new AlertManager inhibit rules were put in place to prevent cascading failure alerts related to the snmp-exporter and script-exporter.

  • 3 Grafana JSON dashboards were edited to auto-refresh various template variables every time the dashboard is reloaded, whereas previous it was set to never reload and was just a static list that got out of date.

Grafana 5.1.4 + auto configmap reloads + Grafana Worldmap plugin

02 Jul 16:28
614ef4b
Compare
Choose a tag to compare

This release has 3 main components, along with a number of smaller changes and bugfix commits:

  1. We now use Grafana 5.1.4. The interface has roughly the same feel, but has many new features and better panel layout options. It also uses the new Grafana provisioning feature for dashboards, instead of the deprecated dashbaords.json feature we were using before.
  2. Configmaps should now get reloaded by their applications when there are changes. Previously, we would have to manually reload, for example, Prometheus when it's configuration changed. With this new feature, services should reload their configmaps automatically.
  3. Grafana now has the Worldmap plugin, which when tied to some data from the bigquery_exporter, can display very interesting mapped data.

Weekly release: 2018-06-11 to 2018-06-25

25 Jun 19:41
aa40372
Compare
Choose a tag to compare

This release adds a number of significant changes and new additions to monitoring:

Additions:

  • new Ops_PodOverview dashboard.
  • new mlabns stackdriver metric collection and accompanying dashboard.
  • node exporter is a standard deployment for prometheus clusters.

Changes:

  • SNMP exporter and Script exporter are scraped using their private VPC network DNS names
  • Prometheus RAM & CPU allocs are increased for all projects so RAM == 2x HEAP size.
  • Github Receiver runs with "inmemory" mode for sandbox and staging. Previously it was only run in mlab-oti.
  • Github Receiver supports native prometheus metrics, for error rates and available API rate limits.
  • Enhancements to Prometheus SelfMetrics dashboard
  • kube-state-metrics is the latest version and as useful as possible.
  • blackbox exporters run redundantly