Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Patroni Integration #2575

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions patroni/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# CHANGELOG - Patroni

## 1.0.0 / 2024-07-12

***Added***:

* Initial Release

69 changes: 69 additions & 0 deletions patroni/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Patroni

## Overview

This check monitors [Patroni][1] collects key metrics from your Patroni-managed PostgreSQL clusters, including cluster state, replication status, and database health. It also monitors your cluster's Distributed Configuration Store (DCS) and provides insights into failover and synchronization activity.

## Setup

### Installation

If you are using Agent v7.21+ / v6.21+ follow the instructions below to install the RedisEnterprise check on your host. See the dedicated Agent guide for [installing community integrations][3] to install checks with the [Agent prior < v7.21 / v6.21][4] or the [Docker Agent][5]:

1. [Download and launch the Datadog Agent][2].
2. Run the following command to install the integrations wheel with the Agent:

```bash
datadog-agent integration install -t datadog-patroni==<INTEGRATION_VERSION>
```
You can find the latest version on the [Datadog Integrations Release Page][12]

**Note**: If necessary, prepend `sudo -u dd-agent` to the install command.

3. Configure your integration like [any other packaged integration][6].

### Configuration

Copy the [sample configuration][7] and update the required sections to collect data from your Patroni cluster

```yaml
instances:

- openmetrics_endpoint: "http://127.0.0.1:8008/metrics"

```

### Validation

Run the [Agent's status subcommand][6] and look for `patroni` under the Checks section.

## Data Collected

### Metrics

See [metadata.csv][7] for a list of metrics provided by this integration.

### Service Checks

See [service_checks.json][8] for a list of service checks provided by this integration

### Events

Patroni check includes failover events to dect when a failover event has occured:

- Patroni Failover Detected

## Troubleshooting

Need help? Contact [Datadog support][3].

[1]: **LINK_TO_INTEGRATION_SITE**
[2]: https://app.datadoghq.com/account/settings/agent/latest
[3]: https://docs.datadoghq.com/agent/kubernetes/integrations/
[4]: https://github.com/DataDog/integrations-extras/blob/master/patroni/datadog_checks/patroni/data/conf.yaml.example
[5]: https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent
[6]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information
[7]: https://github.com/DataDog/integrations-extras/blob/master/patroni/metadata.csv
[8]: https://github.com/DataDog/integrations-extras/blob/master/patroni/assets/service_checks.json
[9]: https://docs.datadoghq.com/help/

31 changes: 31 additions & 0 deletions patroni/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Patroni
files:
- name: patroni.yaml
options:
- template: init_config
options:
- template: init_config/default
- template: instances
options:
- name: openmetrics_endpoint
required: true
description: |
The URL endpoint for the Patroni metrics. This should point to the `/metrics` endpoint exposed by Patroni's OpenMetrics integration.
Example: `http://127.0.0.1:8008/metrics`
value:
example: http://127.0.0.1:8008/metrics
type: string
- name: tags
description: |
A list of tags to associate with metrics collected by this integration. For example, `tags: ["cluster:prod", "scope:demo"]`.
value:
example: ["cluster:prod", "scope:demo"]
type: list
- template: instances/default

- template: logs
example:
- type: file
path: /var/log/patroni/patroni.log
source: patroni
service: patroni
1 change: 1 addition & 0 deletions patroni/assets/dashboards/patroni_overview.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"title":"patroni","description":null,"widgets":[{"id":7430822203011234,"definition":{"type":"image","url":"https://image.nostr.build/7263995b464a3eb3e24124fa8b9b741e71d8b6b4c8308b078a3d59b01796cc88.png","sizing":"contain","margin":"md","has_background":false,"has_border":false,"vertical_align":"center","horizontal_align":"center"},"layout":{"x":0,"y":0,"width":4,"height":2}},{"id":2539920249723822,"definition":{"type":"note","content":"The Patroni Datadog Integration provides comprehensive monitoring for PostgreSQL clusters managed by Patroni. It tracks key metrics like node health, leader status, replication lag, and failover events, ensuring high availability and visibility into your cluster's performance. The integration enables real-time alerts for failovers and replication issues, while also collecting configuration metrics to monitor cluster settings, helping maintain optimal performance and stability.","background_color":"vivid_blue","font_size":"14","text_align":"left","vertical_align":"center","show_tick":true,"tick_pos":"50%","tick_edge":"left","has_padding":true},"layout":{"x":4,"y":0,"width":4,"height":2}},{"id":8442496911374558,"definition":{"title":"Service checks","background_color":"vivid_green","show_title":true,"type":"group","layout_type":"ordered","widgets":[{"id":67078799847168,"definition":{"title":"Config check","title_size":"16","title_align":"left","type":"check_status","check":"patroni.openmetrics.health","grouping":"cluster","group_by":["endpoint"],"tags":[]},"layout":{"x":0,"y":0,"width":2,"height":2}}]},"layout":{"x":0,"y":2,"width":12,"height":3}},{"id":8864854196702302,"definition":{"title":"Leader status","background_color":"vivid_green","show_title":true,"type":"group","layout_type":"ordered","widgets":[{"id":684217196972856,"definition":{"title":"Patroni leader","title_size":"16","title_align":"left","type":"query_table","requests":[{"queries":[{"data_source":"metrics","name":"query1","query":"sum:patroni.primary{*} by {name}.fill(zero).rollup(sum, 15)","aggregator":"last"}],"response_format":"scalar","sort":{"count":500,"order_by":[{"type":"formula","index":0,"order":"desc"}]},"formulas":[{"cell_display_mode":"number","conditional_formats":[{"comparator":">","value":0,"palette":"white_on_green"},{"comparator":"<","value":1,"palette":"yellow_on_white"}],"formula":"cutoff_min(query1, 1)"}]}],"has_search_bar":"auto"},"layout":{"x":0,"y":0,"width":4,"height":2}},{"id":2945986351214600,"definition":{"title":"Patroni leader","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"name":"query1","data_source":"metrics","query":"sum:patroni.primary{*} by {name}.fill(null)"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","order_reverse":false,"line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":4,"y":0,"width":4,"height":2}},{"id":7223400988412302,"definition":{"title":"Leader Running","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"cutoff_min(query1, 1)"}],"queries":[{"data_source":"metrics","name":"query1","query":"sum:patroni.cluster.role.leader{state:running} by {state,name}.fill(null)"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":8,"y":0,"width":4,"height":2}}]},"layout":{"x":0,"y":5,"width":12,"height":3}},{"id":2022008751468964,"definition":{"title":"Replica status","background_color":"vivid_green","show_title":true,"type":"group","layout_type":"ordered","widgets":[{"id":1869417711830344,"definition":{"title":"Patroni replica","title_size":"16","title_align":"left","type":"query_table","requests":[{"queries":[{"data_source":"metrics","name":"query1","query":"max:patroni.replica{*} by {name,endpoint}","aggregator":"last"}],"response_format":"scalar","sort":{"count":500,"order_by":[{"type":"formula","index":0,"order":"desc"}]},"formulas":[{"cell_display_mode":"number","conditional_formats":[{"comparator":">","value":0,"palette":"white_on_green"},{"comparator":"=","value":0,"palette":"yellow_on_white"}],"formula":"query1"}]}],"has_search_bar":"auto"},"layout":{"x":0,"y":0,"width":4,"height":3}},{"id":4015738104725812,"definition":{"title":"Replica streaming","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"sum:patroni.postgres.streaming{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":4,"y":0,"width":4,"height":2}}]},"layout":{"x":0,"y":8,"width":12,"height":4}},{"id":8942710091957508,"definition":{"title":"Failover","background_color":"vivid_green","show_title":true,"type":"group","layout_type":"ordered","widgets":[{"id":3173153223024496,"definition":{"title":"Patroni failover events","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"events","name":"query1","indexes":["*"],"compute":{"aggregation":"count"},"group_by":[{"facet":"new_leader","limit":10,"sort":{"order":"desc","aggregation":"count"}}],"search":{"query":"new_leader:*"}}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":0,"y":0,"width":4,"height":3}},{"id":2495153286114712,"definition":{"title":"patroni failover monitors","type":"manage_status","display_format":"countsAndList","color_preference":"text","hide_zero_counts":true,"show_status":true,"last_triggered_format":"relative","query":"patroni","sort":"status,asc","count":50,"start":0,"summary_type":"monitors","show_priority":false,"show_last_triggered":false},"layout":{"x":4,"y":0,"width":4,"height":3}},{"id":4621706728375622,"definition":{"title":"Stopped nodes","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.failsafe_mode.active{*}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":8,"y":0,"width":4,"height":3}}]},"layout":{"x":0,"y":12,"width":12,"height":4,"is_column_break":true}},{"id":6433497573026708,"definition":{"title":"Lag","background_color":"vivid_orange","show_title":true,"type":"group","layout_type":"ordered","widgets":[{"id":4773747351793264,"definition":{"title":"Cluster Member Lag","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.cluster.member.lag{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":0,"y":0,"width":4,"height":2}},{"id":7701145506832840,"definition":{"title":"DCS last seen","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"autosmooth(exclude_null(query1))"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.dcs_last_seen_diff{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":4,"y":0,"width":4,"height":2}}]},"layout":{"x":0,"y":16,"width":12,"height":3}},{"id":5960610231551056,"definition":{"title":"Postgres Running","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.postgres.running{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":0,"y":0,"width":4,"height":2}},{"id":8576429742463436,"definition":{"title":"Postgres Streaming","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"max:patroni.postgres.streaming{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":4,"y":0,"width":4,"height":2}},{"id":1242201152000370,"definition":{"title":"Patroni Paused","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.is_paused{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":8,"y":0,"width":4,"height":2}},{"id":3262012141778306,"definition":{"title":"Patroni Cluster Unlocked","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.cluster.unlocked{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":0,"y":2,"width":4,"height":2}},{"id":4585094656248780,"definition":{"title":"Patroni Quorum Standby","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"avg:patroni.quorum_standby{*} by {name}"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"line"}]},"layout":{"x":4,"y":2,"width":4,"height":2}},{"id":264924338949946,"definition":{"title":"Timeline increase","title_size":"16","title_align":"left","show_legend":true,"legend_layout":"auto","legend_columns":["avg","min","max","value","sum"],"type":"timeseries","requests":[{"formulas":[{"formula":"query1"}],"queries":[{"data_source":"metrics","name":"query1","query":"sum:patroni.postgres.timeline.count{*} by {name}.as_count()"}],"response_format":"timeseries","style":{"palette":"dog_classic","order_by":"values","line_type":"solid","line_width":"normal"},"display_type":"bars"}]},"layout":{"x":8,"y":2,"width":4,"height":2}}],"template_variables":[{"name":"node","prefix":"node","available_values":[],"default":"*"},{"name":"state","prefix":"state","available_values":[],"default":"*"}],"layout_type":"ordered","notify_list":[],"reflow_type":"fixed"}
1 change: 1 addition & 0 deletions patroni/assets/service_checks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[]
2 changes: 2 additions & 0 deletions patroni/datadog_checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

__path__ = __import__('pkgutil').extend_path(__path__, __name__) # type: ignore
1 change: 1 addition & 0 deletions patroni/datadog_checks/patroni/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "1.0.0"
4 changes: 4 additions & 0 deletions patroni/datadog_checks/patroni/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .__about__ import __version__
from .patroni import PatroniCheck

__all__ = ["__version__", "PatroniCheck"]
21 changes: 21 additions & 0 deletions patroni/datadog_checks/patroni/config_models/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# This file is autogenerated.
# To change this file you should edit assets/configuration/spec.yaml and then run the following commands:
# ddev -x validate config -s <INTEGRATION_NAME>
# ddev -x validate models -s <INTEGRATION_NAME>


from .instance import InstanceConfig
from .shared import SharedConfig


class ConfigMixin:
_config_model_instance: InstanceConfig
_config_model_shared: SharedConfig

@property
def config(self) -> InstanceConfig:
return self._config_model_instance

@property
def shared_config(self) -> SharedConfig:
return self._config_model_shared
12 changes: 12 additions & 0 deletions patroni/datadog_checks/patroni/config_models/defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This file is autogenerated.
# To change this file you should edit assets/configuration/spec.yaml and then run the following commands:
# ddev -x validate config -s <INTEGRATION_NAME>
# ddev -x validate models -s <INTEGRATION_NAME>


def instance_empty_default_hostname():
return False


def instance_min_collection_interval():
return 15
53 changes: 53 additions & 0 deletions patroni/datadog_checks/patroni/config_models/instance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# This file is autogenerated.
# To change this file you should edit assets/configuration/spec.yaml and then run the following commands:
# ddev -x validate config -s <INTEGRATION_NAME>
# ddev -x validate models -s <INTEGRATION_NAME>


from __future__ import annotations

from typing import Optional

from pydantic import BaseModel, ConfigDict, field_validator, model_validator

from datadog_checks.base.utils.functions import identity
from datadog_checks.base.utils.models import validation

from . import defaults, validators


class InstanceConfig(BaseModel):
model_config = ConfigDict(
validate_default=True,
arbitrary_types_allowed=True,
frozen=True,
)
empty_default_hostname: Optional[bool] = None
min_collection_interval: Optional[float] = None
service: Optional[str] = None
tags: Optional[tuple[str, ...]] = None

@model_validator(mode="before")
def _initial_validation(cls, values):
return validation.core.initialize_config(
getattr(validators, "initialize_instance", identity)(values)
)

@field_validator("*", mode="before")
def _validate(cls, value, info):
field = cls.model_fields[info.field_name]
field_name = field.alias or info.field_name
if field_name in info.context["configured_fields"]:
value = getattr(validators, f"instance_{info.field_name}", identity)(
value, field=field
)
else:
value = getattr(defaults, f"instance_{info.field_name}", lambda: value)()

return validation.utils.make_immutable(value)

@model_validator(mode="after")
def _final_validation(cls, model):
return validation.core.check_model(
getattr(validators, "check_instance", identity)(model)
)
Loading
Loading