Skip to content

Commit 166d8f0

Browse files
Gopi-eng2202SuperQbeorn7
authored
Updating "Best practices: Naming lables" #2469 (#2549)
Best practices: Refine metric naming recommendations --------- Signed-off-by: Gopi-eng2202 <[email protected]> Signed-off-by: gopi <[email protected]> Signed-off-by: beorn7 <[email protected]> Co-authored-by: Ben Kochie <[email protected]> Co-authored-by: beorn7 <[email protected]>
1 parent 0696969 commit 166d8f0

File tree

1 file changed

+32
-21
lines changed

1 file changed

+32
-21
lines changed

content/docs/practices/naming.md

Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -20,30 +20,41 @@ A metric name...
2020
client libraries. For metrics specific to an application, the prefix is
2121
usually the application name itself. Sometimes, however, metrics are more
2222
generic, like standardized metrics exported by client libraries. Examples:
23-
* <code><b>prometheus</b>\_notifications\_total</code>
24-
(specific to the Prometheus server)
25-
* <code><b>process</b>\_cpu\_seconds\_total</code>
26-
(exported by many client libraries)
27-
* <code><b>http</b>\_request\_duration\_seconds</code>
28-
(for all HTTP requests)
23+
* <code><b>prometheus</b>\_notifications\_total</code>
24+
(specific to the Prometheus server)
25+
* <code><b>process</b>\_cpu\_seconds\_total</code>
26+
(exported by many client libraries)
27+
* <code><b>http</b>\_request\_duration\_seconds</code>
28+
(for all HTTP requests)
2929
* ...must have a single unit (i.e. do not mix seconds with milliseconds, or seconds with bytes).
30-
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers). See below for a list of base units.
31-
* ...should have a suffix describing the unit, in plural form. Note that an accumulating count has `total` as a suffix, in addition to the unit if applicable.
32-
* <code>http\_request\_duration\_<b>seconds</b></code>
33-
* <code>node\_memory\_usage\_<b>bytes</b></code>
34-
* <code>http\_requests\_<b>total</b></code>
35-
(for a unit-less accumulating count)
36-
* <code>process\_cpu\_<b>seconds\_total</b></code>
37-
(for an accumulating count with unit)
38-
* <code>foobar_build<b>\_info</b></code>
39-
(for a pseudo-metric that provides [metadata](https://www.robustperception.io/exposing-the-software-version-to-prometheus) about the running binary)
40-
* <code>data\_pipeline\_last\_record\_processed\_<b>timestamp_seconds</b></code>
41-
(for a timestamp that tracks the time of the latest record processed in a data processing pipeline)
30+
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers).See below for a list of base units.
31+
* ...should have a suffix describing the unit, in plural form. Note that an accumulating count has `total` as a suffix, in addition to the unit if applicable. Also note that this applies to units in the narrow sense (like the units in the table below), but not to countable things in general. For example, <code>connections</code> or <code>notifications</code> are not considered units for this rule and do not have to be at the end of the metric name. (See also examples in the next paragraph.)
32+
* <code>http\_request\_duration\_<b>seconds</b></code>
33+
* <code>node\_memory\_usage\_<b>bytes</b></code>
34+
* <code>http\_requests\_<b>total</b></code>
35+
(for a unit-less accumulating count)
36+
* <code>process\_cpu\_<b>seconds\_total</b></code>
37+
(for an accumulating count with unit)
38+
* <code>foobar_build<b>\_info</b></code>
39+
(for a pseudo-metric that provides [metadata](https://www.robustperception.io/exposing-the-software-version-to-prometheus) about the running binary)
40+
* <code>data\_pipeline\_last\_record\_processed\_<b>timestamp_seconds</b></code>
41+
(for a timestamp that tracks the time of the latest record processed in a data processing pipeline)
42+
* ...may order its name components in a way that leads to convenient grouping when a list of metric names is sorted lexicographically, as long as all the other rules are followed. The following examples have their the common name components first so that all the related metrics are sorted together:
43+
* <code>prometheus\_tsdb\_head\_truncations\_closed\_total</code>
44+
* <code>prometheus\_tsdb\_head\_truncations\_established\_total</code>
45+
* <code>prometheus\_tsdb\_head\_truncations\_failed\_total</code>
46+
* <code>prometheus\_tsdb\_head\_truncations\_total</code>
47+
48+
The following examples are also valid, but are following a different trade-off. They are easier to read individually, but unrelated metrics like <code>prometheus\_tsdb\_head\_series</code> might get sorted in between.
49+
* <code>prometheus\_tsdb\_head\_closed\_truncations\_total</code>
50+
* <code>prometheus\_tsdb\_head\_established\_truncations\_total</code>
51+
* <code>prometheus\_tsdb\_head\_failed\_truncations\_total</code>
52+
* <code>prometheus\_tsdb\_head\_truncations\_total</code>
4253
* ...should represent the same logical thing-being-measured across all label
4354
dimensions.
44-
* request duration
45-
* bytes of data transfer
46-
* instantaneous resource usage as a percentage
55+
* request duration
56+
* bytes of data transfer
57+
* instantaneous resource usage as a percentage
4758

4859
As a rule of thumb, either the `sum()` or the `avg()` over all dimensions of a
4960
given metric should be meaningful (though not necessarily useful). If it is not

0 commit comments

Comments
 (0)