Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 96 additions & 20 deletions prometheus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Note: Bucket data for a given `<HISTOGRAM_METRIC_NAME>` Prometheus histogram met

### Events

Prometheus Alertmanager alerts are automatically sent to your Datadog event stream following the webhook configuration.
Prometheus Alertmanager alerts are automatically sent to your Datadog event stream following the webhook configuration. See the [Prometheus Alertmanager](#prometheus-alertmanager) section for setup instructions.

### Service Checks

Expand All @@ -66,27 +66,101 @@ The Prometheus check does not include any service checks.
Send Prometheus Alertmanager alerts in the event stream. Natively, Alertmanager sends all alerts simultaneously to the configured webhook. To see alerts in Datadog, you must configure your instance of Alertmanager to send alerts one at a time. You can add a group-by parameter under `route` to have alerts grouped by the actual name of the alert rule.

### Setup
1. Edit the Alertmanager configuration file, `alertmanager.yml`, to include the following:
```
receivers:
- name: datadog
webhook_configs:
- send_resolved: true
url: https://app.datadoghq.com/intake/webhook/prometheus?api_key=<DATADOG_API_KEY>
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5m
receiver: datadog
repeat_interval: 3h
```

**Note**: This endpoint accepts only one event in the payload at a time.

<!-- xxx tabs xxx -->
<!-- xxx tab "V2 (preferred)" xxx -->

1. Edit the `alertmanager.yml` configuration file to include the following:

```yaml
receivers:
- name: datadog
webhook_configs:
- send_resolved: true
url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5m
receiver: datadog
repeat_interval: 3h
```

<div class="alert alert-info">
<ul>
<li> The <code>group_by</code> parameter determines how alerts are grouped together when sent to Datadog. Alerts with matching values for the specified labels are combined into a single notification. For details on routing configuration, see the <a href="https://prometheus.io/docs/alerting/latest/configuration/">Prometheus Alertmanager documentation</a>.</li>
<li>This endpoint accepts only one event in the payload at a time.</li>
</ul>
</div>

2. (Optional) Use matchers to redirect specific alerts to different receivers. Matchers allow routing based on any alert label. For syntax details, see the [Alertmanager matcher documentation][12].

The V2 webhook supports additional query parameters. For example, use the `oncall_team` parameter to integrate with [Datadog On-Call][11] and redirect pages to different teams:

```yaml
receivers:
- name: datadog-ops
webhook_configs:
- send_resolved: true
url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus&oncall_team=ops
- name: datadog-db
webhook_configs:
- send_resolved: true
url: https://event-management-intake.datadoghq.com/api/v2/events/webhook?dd-api-key=<DATADOG_API_KEY>&integration_id=prometheus&oncall_team=database

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5m
receiver: datadog-ops
repeat_interval: 3h
routes:
- matchers:
- team="database"
receiver: datadog-db
```

<div class="alert alert-tip">
Setting <code>send_resolved: true</code> (the default value) enables Alertmanager to send notifications when alerts are resolved in Prometheus. This is particularly important when using the <code>oncall_team</code> parameter to ensure that pages are marked as resolved. Note that resolved notifications may be delayed until the next <code>group_interval</code>.
</div>

3. Restart the Prometheus and Alertmanager services.

```shell
sudo systemctl restart prometheus.service alertmanager.service
```

<!-- xxz tab xxx -->
<!-- xxx tab "V1" xxx -->

1. Edit the `alertmanager.yml` configuration file to include the following:

```yaml
receivers:
- name: datadog
webhook_configs:
- send_resolved: true
url: https://app.datadoghq.com/intake/webhook/prometheus?api_key=<DATADOG_API_KEY>
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5m
receiver: datadog
repeat_interval: 3h
```

<div class="alert alert-info">
This endpoint accepts only one event in the payload at a time.
</div>

2. Restart the Prometheus and Alertmanager services.
```
sudo systemctl restart prometheus.service alertmanager.service
```

```shell
sudo systemctl restart prometheus.service alertmanager.service
```

<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->

## Troubleshooting

Expand All @@ -108,3 +182,5 @@ Need help? Contact [Datadog support][7].
[8]: https://www.datadoghq.com/blog/monitor-prometheus-metrics
[9]: https://docs.datadoghq.com/agent/prometheus/
[10]: https://docs.datadoghq.com/developers/prometheus/
[11]: https://docs.datadoghq.com/service_management/on-call/
[12]: https://prometheus.io/docs/alerting/latest/configuration/#matcher
Loading