Releases · m-lab/prometheus-support

18 Sep 17:04

nkinkade

0cb0fba

One-off bug fix release

This release includes a fix to a bug introduced in the previous release which was causing floods of spurious alerts to fire for LameDuckMetricMissingForNode alert.

Assets 2

18 Sep 16:03

nkinkade

production/1.37

db47ce2

Weekly release: 2018-09-10 to 2018-09-18

This release introduces a new k8s deployment, service and ingress for the Github Maintenance Exporter.

Assets 2

10 Sep 16:51

nkinkade

production/1.36

82fab1b

Weekly release: 2018-08-28 to 2018-09-10

This release features:

A number of improvements to alerting, including fixes for some existing alerts to make them less noisy, plus some new alerts.
Updates data-processing-cluster's Prometheus instance to v2.3.2.
Adds a new BQ exporter query to check for completeness of NDT test annotations.

Assets 2

28 Aug 13:54

stephen-soltesz

production/1.35

49ea91f

Weekly release: 2018-08-23 to 2018-08-28

Including a typo fix for the ParserDailyVolumeTooLow dashboard.
Increasing the timeout for the SnmpScrapingDownAtSite alert to 60m.

Assets 2

23 Aug 18:22

stephen-soltesz

production/1.34

0625ff0

Weekly release: 2018-08-14 to 2018-08-23

This release increases the default RAM allocated to prometheus in mlab-oti and increases the cache index flag parameters to improve interactive query support.

As well, the ParserDailyVolumeTooLow alert is now built on a recording rule that should make the evaluation much more efficient.

Assets 2

14 Aug 18:05

nkinkade

production/1.33

80e8980

33 new commits with various new features, improvements and bug fixes

Overview

Repos prometheus-snmp-exporter and prometheus-script-exporter were renamed to snmp-exporter-support and script-exporter-support, respectively.
A major bug was fixed in a couple AlertMananger inhibit rules which went undiscovered due to the fact that AM was never getting reloaded to read in the new configmap which introduced the breakage. This is fixed now.
Grafana data sources as code! Data sources are now managed YAML files, similar to how dashboards are managed through JSON files.
Ad-hoc monitoring for new platform k8s cluster nodes mlab3.lga03 and mlab3.lax02.
Ad-hoc monitoring for the new U.S. ndt-cloud VMs.
ConfigMap reloader added to the AlertManager pod.
Automatic Google Cloud DNS entries created for properly configured k8s services.
New monitoring/metrics for "fast sidestream" and "tcp-info".

Assets 2

23 Jul 20:59

nkinkade

production/1.32

8e55f1f

Client-geohash query + metric_relabel_configs for sidestream

This release contains three main changes:

A new bq_exporter query to aggregate NDT geospatially. This will allow us to create and maybe publish heatmaps for NDT performance.
Label rewrite rules for legacy targets to set the experiment index label correctly. The index numbers will be used associate Sidestream aggregate traffic on a per experiment basis.
We now scrape node_exporters running on dns.measurementlab.net and mirror.measurementlab.net, and alert on them too.

Assets 2

19 Jul 21:02

nkinkade

production/1.31

50a8ba7

Auto-deploy cloud VM targets + AM inhibit rules

This release contains three main changes:

Prometheus targets for ndt-cloud GCE VMs are now stored as static files in this repository, and are deployed to k8s persistent storage automatically by Travis builds. From here forward, if ndt-cloud targets need to change, edit them in this repo.
A couple new AlertManager inhibit rules were put in place to prevent cascading failure alerts related to the snmp-exporter and script-exporter.
3 Grafana JSON dashboards were edited to auto-refresh various template variables every time the dashboard is reloaded, whereas previous it was set to never reload and was just a static list that got out of date.

Assets 2

02 Jul 16:28

nkinkade

production/1.30

614ef4b

Grafana 5.1.4 + auto configmap reloads + Grafana Worldmap plugin

This release has 3 main components, along with a number of smaller changes and bugfix commits:

We now use Grafana 5.1.4. The interface has roughly the same feel, but has many new features and better panel layout options. It also uses the new Grafana provisioning feature for dashboards, instead of the deprecated dashbaords.json feature we were using before.
Configmaps should now get reloaded by their applications when there are changes. Previously, we would have to manually reload, for example, Prometheus when it's configuration changed. With this new feature, services should reload their configmaps automatically.
Grafana now has the Worldmap plugin, which when tied to some data from the bigquery_exporter, can display very interesting mapped data.

Assets 2

25 Jun 19:41

stephen-soltesz

production/1.29

aa40372

Weekly release: 2018-06-11 to 2018-06-25

This release adds a number of significant changes and new additions to monitoring:

Additions:

new Ops_PodOverview dashboard.
new mlabns stackdriver metric collection and accompanying dashboard.
node exporter is a standard deployment for prometheus clusters.

Changes:

SNMP exporter and Script exporter are scraped using their private VPC network DNS names
Prometheus RAM & CPU allocs are increased for all projects so RAM == 2x HEAP size.
Github Receiver runs with "inmemory" mode for sandbox and staging. Previously it was only run in mlab-oti.
Github Receiver supports native prometheus metrics, for error rates and available API rate limits.
Enhancements to Prometheus SelfMetrics dashboard
kube-state-metrics is the latest version and as useful as possible.
blackbox exporters run redundantly

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: m-lab/prometheus-support

One-off bug fix release

Uh oh!

Weekly release: 2018-09-10 to 2018-09-18

Uh oh!

Weekly release: 2018-08-28 to 2018-09-10

Uh oh!

Weekly release: 2018-08-23 to 2018-08-28

Uh oh!

Weekly release: 2018-08-14 to 2018-08-23

Uh oh!

33 new commits with various new features, improvements and bug fixes

Overview

Uh oh!

Client-geohash query + metric_relabel_configs for sidestream

Uh oh!

Auto-deploy cloud VM targets + AM inhibit rules

Uh oh!

Grafana 5.1.4 + auto configmap reloads + Grafana Worldmap plugin

Uh oh!

Weekly release: 2018-06-11 to 2018-06-25

Uh oh!