Skip to content

Commit

Permalink
feat(application-template): PrometheusRule CR 기능 추가하기 (#17)
Browse files Browse the repository at this point in the history
* fix: render service monitor CR only the CRD installed

* feat: add prometheus rule CR

* feat: change rules to alerting_roles

* feat: worker, scheduler에 prometheus rule 추가

* feat: trigger actions

* feat: update chart version

* feat: update readme
  • Loading branch information
atobaum authored Apr 5, 2024
1 parent 8a42bea commit d8f9e8f
Show file tree
Hide file tree
Showing 7 changed files with 295 additions and 5 deletions.
2 changes: 1 addition & 1 deletion charts/application-template/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ maintainers:
- name: modusign
url: https://github.com/modusign
name: application-template
version: 1.4.2
version: 1.5.0
12 changes: 9 additions & 3 deletions charts/application-template/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# application-template

![Version: 1.3.2](https://img.shields.io/badge/Version-1.3.2-informational?style=flat-square) ![AppVersion: v1.0.0](https://img.shields.io/badge/AppVersion-v1.0.0-informational?style=flat-square)
![Version: 1.5.0](https://img.shields.io/badge/Version-1.5.0-informational?style=flat-square) ![AppVersion: v1.0.0](https://img.shields.io/badge/AppVersion-v1.0.0-informational?style=flat-square)

A Helm chart for Modusign Applications

Expand Down Expand Up @@ -33,6 +33,8 @@ Kubernetes: `>=1.23`
| global.minReadySeconds | int | `60` | optional field that specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing |
| global.nodeSelector | object | `{}` | Default node selector for all components |
| global.observability.datadog | object | `{"admissionController":{"enabled":false}}` | inject datadog admission controller env label |
| global.observability.prometheus | object | `{"serviceMonitor":{"enabled":false,"path":"/metrics","portName":"metrics"}}` | set up additional service port and setup |
| global.observability.prometheus.serviceMonitor | object | `{"enabled":false,"path":"/metrics","portName":"metrics"}` | create Prometheus Operator ServiceMonitor CR |
| global.podAnnotations | object | `{}` | Annotations for the all deployed pods |
| global.podLabels | object | `{}` | Labels for the all deployed pods |
| global.revisionHistoryLimit | int | `3` | Number of old deployment ReplicaSets to retain. The rest will be garbage collected. |
Expand Down Expand Up @@ -73,6 +75,8 @@ Kubernetes: `>=1.23`
| scheduler.istio.virtualServices | list | `[]` | virtualService configuration |
| scheduler.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
| scheduler.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
| scheduler.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
| scheduler.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
| scheduler.pdb.annotations | object | `{}` | Annotations to be added to scheduler pdb |
| scheduler.pdb.enabled | bool | `false` | Deploy a [PodDisruptionBudget] for the scheduler |
| scheduler.pdb.labels | object | `{}` | Labels to be added to scheduler pdb |
Expand Down Expand Up @@ -128,6 +132,8 @@ Kubernetes: `>=1.23`
| server.istio.virtualServices | list | `[]` | virtualService configuration |
| server.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
| server.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
| server.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
| server.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
| server.pdb.annotations | object | `{}` | Annotations to be added to server pdb |
| server.pdb.enabled | bool | `true` | Deploy a [PodDisruptionBudget] for the server |
| server.pdb.labels | object | `{}` | Labels to be added to server pdb |
Expand Down Expand Up @@ -181,6 +187,8 @@ Kubernetes: `>=1.23`
| worker.istio.virtualServices | list | `[]` | virtualService configuration |
| worker.lifecycle | object | `{}` | Specify postStart and preStop lifecycle hooks for your container |
| worker.nodeSelector | object | `{}` (defaults to global.nodeSelector) | [Node selector] |
| worker.observability.prometheus.alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for service container |
| worker.observability.prometheus.istio_alerting_rules | object | `{"enabled":false,"highCpuUsageThreshold":70,"highMemoryUsageThreshold":70}` | create Prometheus Operator PrometheusRule CR for istio proxy container |
| worker.pdb.annotations | object | `{}` | Annotations to be added to worker pdb |
| worker.pdb.enabled | bool | `false` | Deploy a [PodDisruptionBudget] for the worker |
| worker.pdb.labels | object | `{}` | Labels to be added to worker pdb |
Expand All @@ -207,5 +215,3 @@ Kubernetes: `>=1.23`
| worker.volumes | list | `[]` | Additional volumes to the application worker pod |
| worker.workload | string | `"deployment"` | set deployment kind to Rollouts rollout: enabled : false |

----------------------------------------------
Autogenerated from chart metadata using [helm-docs v1.11.3](https://github.com/norwoodj/helm-docs/releases/v1.11.3)
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
{{- if .Capabilities.APIVersions.Has "monitoring.coreos.com/v1/PrometheusRule" }}
{{- if and .Values.scheduler.enabled (or .Values.scheduler.observability.prometheus.alerting_rules.enabled .Values.scheduler.observability.prometheus.istio_alerting_rules.enabled) }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ template "application.scheduler.name" . }}
namespace: {{ .Release.Namespace }}
spec:
groups:
{{- if .Values.scheduler.observability.prometheus.alerting_rules.enabled }}
- name: ServiceContainerResourceUsage
alerting_rules:
- alert: "HighServiceContainerCPUUsage"
expr: |
avg(
rate(container_cpu_usage_seconds_total{ container={{ .Values.scheduler.name | quote }} }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} }
/ on(pod)
(kube_pod_container_resource_limits{ resource="cpu", container={{ .Values.scheduler.name | quote }} })
)
* 100
> {{ .Values.scheduler.observability.prometheus.alerting_rules.highCpuUsageThreshold }}
for: 5m
labels:
severity: critical
annotations:
summary: "[{{ include "application.scheduler.name" . | title }}] High CPU usage"
description: "[{{ include "application.scheduler.name" . | title }}] 서비스의 최근 CPU 사용량이 {{ .Values.scheduler.observability.prometheus.alerting_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"

- alert: HighServiceContainerMemoryUsage
expr: |
avg(
(container_memory_rss{ container={{ .Values.scheduler.name | quote }} } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} })
/ on(pod)
(kube_pod_container_resource_limits{ resource="memory", container={{ .Values.scheduler.name | quote }} })
) * 100
> {{ .Values.scheduler.observability.prometheus.alerting_rules.highMemoryUsageThreshold }}
for: 5m
labels:
service: {{ include "application.scheduler.name" . | quote }}
severity: critical
annotations:
summary: "[{{ include "application.scheduler.name" . | title }}] High memory usage"
description: "[{{ include "application.scheduler.name" . | title }}] 서비스의 최근 메모리 사용량이 {{ .Values.scheduler.observability.prometheus.alerting_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
{{- end }}
{{- if .Values.scheduler.observability.prometheus.istio_alerting_rules.enabled }}
- name: IstioContainerResourceUsage
alerting_rules:
- alert: "HighIstioContainerCPUUsage"
expr: |
avg(
rate(container_cpu_usage_seconds_total{ container="istio-proxy" }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} }
/ on(pod)
(kube_pod_container_resource_limits{ resource="cpu", container="istio-proxy" })
)
* 100
> {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highCpuUsageThreshold }}
for: 5m
labels:
severity: critical
annotations:
summary: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] High CPU usage"
description: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] 서비스의 최근 CPU 사용량이 {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"

- alert: HighIstioContainerMemoryUsage
expr: |
avg(
(container_memory_rss{ container="istio-proxy" } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.scheduler.name" . | quote }} })
/ on(pod)
(kube_pod_container_resource_limits{ resource="memory", container="istio-proxy" })
) * 100
> {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highMemoryUsageThreshold }}
for: 5m
labels:
service: {{ include "application.scheduler.name" . | quote }}
severity: critical
annotations:
summary: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] High memory usage"
description: "[{{ include "application.scheduler.name" . | title }}][istio-proxy] 서비스의 최근 메모리 사용량이 {{ .Values.scheduler.observability.prometheus.istio_alerting_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
{{- end }}
{{- end }}
{{- end }}
81 changes: 81 additions & 0 deletions charts/application-template/templates/server/prometheus-rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
{{- if .Capabilities.APIVersions.Has "monitoring.coreos.com/v1/PrometheusRule" }}
{{- if and .Values.server.enabled (or .Values.server.observability.prometheus.rules.enabled .Values.server.observability.prometheus.istio_rules.enabled) }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ template "application.server.name" . }}
namespace: {{ .Release.Namespace }}
spec:
groups:
{{- if .Values.server.observability.prometheus.rules.enabled }}
- name: ServiceContainerResourceUsage
rules:
- alert: "HighServiceContainerCPUUsage"
expr: |
avg(
rate(container_cpu_usage_seconds_total{ container={{ .Values.server.name | quote }} }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} }
/ on(pod)
(kube_pod_container_resource_limits{ resource="cpu", container={{ .Values.server.name | quote }} })
)
* 100
> {{ .Values.server.observability.prometheus.rules.highCpuUsageThreshold }}
for: 5m
labels:
severity: critical
annotations:
summary: "[{{ include "application.server.name" . | title }}] High CPU usage"
description: "[{{ include "application.server.name" . | title }}] 서비스의 최근 CPU 사용량이 {{ .Values.server.observability.prometheus.rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"

- alert: HighServiceContainerMemoryUsage
expr: |
avg(
(container_memory_rss{ container={{ .Values.server.name | quote }} } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} })
/ on(pod)
(kube_pod_container_resource_limits{ resource="memory", container={{ .Values.server.name | quote }} })
) * 100
> {{ .Values.server.observability.prometheus.rules.highMemoryUsageThreshold }}
for: 5m
labels:
service: {{ include "application.server.name" . | quote }}
severity: critical
annotations:
summary: "[{{ include "application.server.name" . | title }}] High memory usage"
description: "[{{ include "application.server.name" . | title }}] 서비스의 최근 메모리 사용량이 {{ .Values.server.observability.prometheus.rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
{{- end }}
{{- if .Values.server.observability.prometheus.istio_rules.enabled }}
- name: IstioContainerResourceUsage
rules:
- alert: "HighIstioContainerCPUUsage"
expr: |
avg(
rate(container_cpu_usage_seconds_total{ container="istio-proxy" }[2m]) * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} }
/ on(pod)
(kube_pod_container_resource_limits{ resource="cpu", container="istio-proxy" })
)
* 100
> {{ .Values.server.observability.prometheus.istio_rules.highCpuUsageThreshold }}
for: 5m
labels:
severity: critical
annotations:
summary: "[{{ include "application.server.name" . | title }}][istio-proxy] High CPU usage"
description: "[{{ include "application.server.name" . | title }}][istio-proxy] 서비스의 최근 CPU 사용량이 {{ .Values.server.observability.prometheus.istio_rules.highCpuUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"

- alert: HighIstioContainerMemoryUsage
expr: |
avg(
(container_memory_rss{ container="istio-proxy" } * on(pod) group_left kube_pod_labels{ label_app_kubernetes_io_name={{ include "application.server.name" . | quote }} })
/ on(pod)
(kube_pod_container_resource_limits{ resource="memory", container="istio-proxy" })
) * 100
> {{ .Values.server.observability.prometheus.istio_rules.highMemoryUsageThreshold }}
for: 5m
labels:
service: {{ include "application.server.name" . | quote }}
severity: critical
annotations:
summary: "[{{ include "application.server.name" . | title }}][istio-proxy] High memory usage"
description: "[{{ include "application.server.name" . | title }}][istio-proxy] 서비스의 최근 메모리 사용량이 {{ .Values.server.observability.prometheus.istio_rules.highMemoryUsageThreshold }}% 이상이 되었습니다. 현재값: {{`{{ .Value | humanize }}`}}%"
{{- end }}
{{- end }}
{{- end }}
2 changes: 1 addition & 1 deletion charts/application-template/templates/service_monitor.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.global.observability.prometheus.serviceMonitor.enabled }}
{{- if and .Values.global.observability.prometheus.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
Expand Down
Loading

0 comments on commit d8f9e8f

Please sign in to comment.