|  | 
| 2 | 2 | 
 | 
| 3 | 3 | [](https://github.com/kubernetes-monitoring/kubernetes-mixin/actions/workflows/ci.yaml) | 
| 4 | 4 | 
 | 
| 5 |  | -> NOTE: This project is *pre-release* stage. Flags, configuration, behaviour and design may change significantly in following releases. | 
| 6 |  | -
 | 
| 7 | 5 | A set of Grafana dashboards and Prometheus alerts for Kubernetes. | 
| 8 | 6 | 
 | 
|  | 7 | +## Local development | 
|  | 8 | + | 
|  | 9 | +Run the following command to setup a local [kind](https://kind.sigs.k8s.io) cluster: | 
|  | 10 | + | 
|  | 11 | +```shell | 
|  | 12 | +make dev | 
|  | 13 | +``` | 
|  | 14 | + | 
|  | 15 | +You should see the following output if successful: | 
|  | 16 | + | 
|  | 17 | +```shell | 
|  | 18 | +╔═══════════════════════════════════════════════════════════════╗ | 
|  | 19 | +║             🚀 Development Environment Ready! 🚀              ║ | 
|  | 20 | +║                                                               ║ | 
|  | 21 | +║   Run `make dev-port-forward`                                 ║ | 
|  | 22 | +║   Grafana will be available at http://localhost:3000          ║ | 
|  | 23 | +║                                                               ║ | 
|  | 24 | +║   Data will be available in a few minutes.                    ║ | 
|  | 25 | +║                                                               ║ | 
|  | 26 | +║   Dashboards will refresh every 10s, run `make generate`      ║ | 
|  | 27 | +║   and refresh your browser to see the changes.                ║ | 
|  | 28 | +║                                                               ║ | 
|  | 29 | +║   Alert and recording rules require `make dev-reload`.        ║ | 
|  | 30 | +║                                                               ║ | 
|  | 31 | +╚═══════════════════════════════════════════════════════════════╝ | 
|  | 32 | +``` | 
|  | 33 | + | 
|  | 34 | +To delete the cluster, run the following: | 
|  | 35 | + | 
|  | 36 | +```shell | 
|  | 37 | +make dev-down | 
|  | 38 | +``` | 
|  | 39 | + | 
| 9 | 40 | ## Releases | 
| 10 | 41 | 
 | 
| 11 | 42 | > Note: Releases up until `release-0.12` are changes in their own branches. Changelogs are included in releases starting from [version-0.13.0](https://github.com/kubernetes-monitoring/kubernetes-mixin/releases/tag/version-0.13.0). | 
| @@ -33,7 +64,7 @@ Some alerts now use Prometheus filters made available in Prometheus 2.11.0, whic | 
| 33 | 64 | 
 | 
| 34 | 65 | Warning: This compatibility matrix was initially created based on experience, we do not guarantee the compatibility, it may be updated based on new learnings. | 
| 35 | 66 | 
 | 
| 36 |  | -Warning: By default the expressions will generate *grafana 7.2+* compatible rules using the *$__rate_interval* variable for rate functions. If you need backward compatible rules please set *grafana72: false* in your *_config* | 
|  | 67 | +Warning: By default the expressions will generate *grafana 7.2+* compatible rules using the *$\_\_rate_interval* variable for rate functions. If you need backward compatible rules please set *grafana72: false* in your *\_config* | 
| 37 | 68 | 
 | 
| 38 | 69 | ### Release steps | 
| 39 | 70 | 
 | 
| @@ -75,6 +106,7 @@ node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate5m | 
| 75 | 106 | This mixin is designed to be vendored into the repo with your infrastructure config. To do this, use [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler): | 
| 76 | 107 | 
 | 
| 77 | 108 | You then have three options for deploying your dashboards | 
|  | 109 | + | 
| 78 | 110 | 1. Generate the config files and deploy them yourself | 
| 79 | 111 | 2. Use ksonnet to deploy this mixin along with Prometheus and Grafana | 
| 80 | 112 | 3. Use prometheus-operator to deploy this mixin (TODO) | 
| @@ -109,11 +141,12 @@ The `prometheus_alerts.yaml` and `prometheus_rules.yaml` file then need to passe | 
| 109 | 141 | ### Dashboards for Windows Nodes | 
| 110 | 142 | 
 | 
| 111 | 143 | There exist separate dashboards for windows resources. | 
| 112 |  | -1) Compute Resources / Cluster(Windows) | 
| 113 |  | -2) Compute Resources / Namespace(Windows) | 
| 114 |  | -3) Compute Resources / Pod(Windows) | 
| 115 |  | -4) USE Method / Cluster(Windows) | 
| 116 |  | -5) USE Method / Node(Windows) | 
|  | 144 | + | 
|  | 145 | +1. Compute Resources / Cluster(Windows) | 
|  | 146 | +2. Compute Resources / Namespace(Windows) | 
|  | 147 | +3. Compute Resources / Pod(Windows) | 
|  | 148 | +4. USE Method / Cluster(Windows) | 
|  | 149 | +5. USE Method / Node(Windows) | 
| 117 | 150 | 
 | 
| 118 | 151 | These dashboards are based on metrics populated by [windows-exporter](https://github.com/prometheus-community/windows_exporter) from each Windows node. | 
| 119 | 152 | 
 | 
| @@ -270,14 +303,14 @@ Same result can be achieved by modyfying the existing `config.libsonnet` with th | 
| 270 | 303 | 
 | 
| 271 | 304 | While the community has not yet fully agreed on alert severities and their to be used, this repository assumes the following paradigms when setting the severities: | 
| 272 | 305 | 
 | 
| 273 |  | -* Critical: An issue, that needs to page a person to take instant action | 
| 274 |  | -* Warning: An issue, that needs to be worked on but in the regular work queue or for during office hours rather than paging the oncall | 
| 275 |  | -* Info: Is meant to support a trouble shooting process by informing about a non-normal situation for one or more systems but not worth a page or ticket on its own. | 
|  | 306 | +- Critical: An issue, that needs to page a person to take instant action | 
|  | 307 | +- Warning: An issue, that needs to be worked on but in the regular work queue or for during office hours rather than paging the oncall | 
|  | 308 | +- Info: Is meant to support a trouble shooting process by informing about a non-normal situation for one or more systems but not worth a page or ticket on its own. | 
| 276 | 309 | 
 | 
| 277 | 310 | ### Architecture and Technical Decisions | 
| 278 | 311 | 
 | 
| 279 |  | -* For more motivation, see "[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin. | 
| 280 |  | -* For more information about monitoring mixins, see this [design doc](DESIGN.md). | 
|  | 312 | +- For more motivation, see "[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin. | 
|  | 313 | +- For more information about monitoring mixins, see this [design doc](DESIGN.md). | 
| 281 | 314 | 
 | 
| 282 | 315 | ## Note | 
| 283 | 316 | 
 | 
|  | 
0 commit comments