Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prometheus metric format (#2825) #2899

Merged
merged 1 commit into from
Mar 12, 2025

Conversation

ZhengweiZhu
Copy link
Contributor

@ZhengweiZhu ZhengweiZhu commented Feb 23, 2025

Fix PrometheusMetricsDumper to dump unique comment of a certain metric name to follow the prometheus specification.
Also refine the prometheus ut case to support mvar output.

What problem does this PR solve?

Issue Number:
#2825

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects(性能影响): NO

  • Breaking backward compatibility(向后兼容性):


Check List:

  • Please make sure your changes are compilable(请确保你的更改可以通过编译).
  • When providing us with a new feature, it is best to add related tests(如果你向我们增加一个新的功能, 请添加相关测试).
  • Please follow Contributor Covenant Code of Conduct.(请遵循贡献者准则).

<< name << " " << desc << '\n';
// To meet specification of prometheus metric, there should be only
// one TYPE field for the same metric name.
if (metrics_name.as_string() != _last_metric_name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply ignoring metrics that are duplicates of the previous ones does not completely solve the problem. If the metrics themselves are repetitive, perhaps we should try to identify and address the root cause; if it's just a matter of duplicate names, consider fixing the metrics name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add some background knowledge. The metrics are stored in butil::FlatMap and the uniqueness of metric name is guaranteed. The problem arises from the introduction of mvar, where one metric name will correspond to multiple metric with different labels. My patch just handles this problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a virtual function for mvabr to Dumper? I think this can completely solve the problem.

bool mutil_dump(
    const std::string& mutil_name,
    const std::vector<std::string>& names,
    const std::vector<std::string>& descriptions) override;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenBright @wasphin The pr is updated according to review comment and ut case is added. Please take a look and merge if nothing wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll take a look at the changes in the next few days.

@wwbmmm
Copy link
Contributor

wwbmmm commented Feb 24, 2025

This PR is similar to #2870, which one is better?

@ZhengweiZhu
Copy link
Contributor Author

ZhengweiZhu commented Feb 24, 2025

This PR is similar to #2870, which one is better?

The #2870 introduced a std::set container to remove duplicates, which is unnecessary and impacts performance. As the metrics are stored in butil::FlatMap and the uniqueness of metric name is guaranteed.

When we just expose bvar, the current upstream code is fine. The problem arises from the introduction of mvar, where one metric name will correspond to multiple metric with different labels. My patch handles this problem and follow the specifition of Prometheus metric.

@wwbmmm
Copy link
Contributor

wwbmmm commented Feb 24, 2025

LGTM

@wasphin
Copy link
Member

wasphin commented Feb 24, 2025

There's a new dump_comment API when introducing mvar by @serverglen, is this the case to set some HELP/TYPE for it? dump_comment is only used by mvar.

$ git grep dump_comment

src/bvar/multi_dimension_inl.h:242:    if (label_names.empty() || !dumper->dump_comment(name(), METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:279:        if (!dumper->dump_comment(name() + "_latency", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:312:        if (!dumper->dump_comment(name() + "_max_latency", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:323:        if (!dumper->dump_comment(name() + "_qps", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:334:        if (!dumper->dump_comment(name() + "_count", METRIC_TYPE_COUNTER)) {
src/bvar/variable.h:56:    virtual bool dump_comment(const std::string&, const std::string& /*type*/) {

@ZhengweiZhu
Copy link
Contributor Author

ZhengweiZhu commented Feb 24, 2025

There's a new dump_comment API when introducing mvar by @serverglen, is this the case to set some HELP/TYPE for it? dump_comment is only used by mvar.

$ git grep dump_comment

src/bvar/multi_dimension_inl.h:242:    if (label_names.empty() || !dumper->dump_comment(name(), METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:279:        if (!dumper->dump_comment(name() + "_latency", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:312:        if (!dumper->dump_comment(name() + "_max_latency", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:323:        if (!dumper->dump_comment(name() + "_qps", METRIC_TYPE_GAUGE)) {
src/bvar/multi_dimension_inl.h:334:        if (!dumper->dump_comment(name() + "_count", METRIC_TYPE_COUNTER)) {
src/bvar/variable.h:56:    virtual bool dump_comment(const std::string&, const std::string& /*type*/) {

Reasonable. By overriding dump_comment method of PrometheusMetricsDumper to output HELP/TYPE for a metric, then the dump method should be changed to only output name and value (not including HELP/TYPE). I will try it this way.

@ZhengweiZhu
Copy link
Contributor Author

ZhengweiZhu commented Mar 1, 2025

Update: The original patch only handles mvar not in LatencyRecorder type correctly. When the mvar is in LatencyRecorder type, things get more complicated.

For example, we expose a mvar in LatencyRecorder type with two different labels.

    bvar::MultiDimension<bvar::LatencyRecorder > my_lat("dump_lat", {"idc", "method", "status"});
    bvar::LatencyRecorder* lat1 = my_lat.get_stats({"gz", "post", "200"});
    *lat1 << 1;
    *lat1 << 2;

    bvar::LatencyRecorder* lat2 = my_lat.get_stats({"tj", "post", "300"});
    *lat2 << 3;
    *lat2 << 4;

For the upstream code

#curl -s 127.0.0.1:8000/brpc_metrics

# HELP dump_lat_latency
# TYPE dump_lat_latency gauge
dump_lat_latency{idc="gz",method="post",status="200",quantile="0"} 0
# HELP dump_lat_latency
# TYPE dump_lat_latency gauge
dump_lat_latency{idc="gz",method="post",status="200",quantile="80"} 0
...

#curl -s 127.0.0.1:8000/brpc_metrics | promtool check metrics
error while linting: text format parsing error in line 5: second TYPE line for metric name "dump_lat_latency", or TYPE reported after samples

After applying my original patch (judging if last metric name is equal to this one)

#curl -s 127.0.0.1:8000/brpc_metrics

# HELP dump_lat_latency
# TYPE dump_lat_latency gauge
dump_lat_latency{idc="gz",method="post",status="200",quantile="0"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="80"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="90"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="99"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="999"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="9999"} 0
# HELP dump_lat_max_latency
# TYPE dump_lat_max_latency gauge
dump_lat_max_latency{idc="gz",method="post",status="200"} 0
# HELP dump_lat_qps
# TYPE dump_lat_qps gauge
dump_lat_qps{idc="gz",method="post",status="200"} 0
# HELP dump_lat_count
# TYPE dump_lat_count gauge
dump_lat_count{idc="gz",method="post",status="200"} 2
# HELP dump_lat_latency
# TYPE dump_lat_latency gauge
dump_lat_latency{idc="tj",method="post",status="300",quantile="0"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="80"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="90"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="99"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="999"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="9999"} 0
# HELP dump_lat_max_latency
# TYPE dump_lat_max_latency gauge
dump_lat_max_latency{idc="tj",method="post",status="300"} 0
# HELP dump_lat_qps
# TYPE dump_lat_qps gauge
dump_lat_qps{idc="tj",method="post",status="300"} 0
# HELP dump_lat_count
# TYPE dump_lat_count gauge
dump_lat_count{idc="tj",method="post",status="300"} 2

#curl -s 127.0.0.1:8000/brpc_metrics | promtool check metrics
error while linting: text format parsing error in line 19: second TYPE line for metric name "dump_lat_latency", or TYPE reported after samples
The problem here is that the second "# TYPE dump_lat_latency gauge" in line 19 is equal to the one in line 2.

So we need to refactor code and rearrange the output of mvar in LatencyRecorder type by putting the same metric name (with different labels) together, such like the following(which pass the promtool check):

# HELP dump_lat_latency
# TYPE dump_lat_latency gauge
dump_lat_latency{idc="gz",method="post",status="200",quantile="0"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="80"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="90"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="99"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="999"} 0
dump_lat_latency{idc="gz",method="post",status="200",quantile="9999"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="0"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="80"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="90"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="99"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="999"} 0
dump_lat_latency{idc="tj",method="post",status="300",quantile="9999"} 0
# HELP dump_lat_max_latency
# TYPE dump_lat_max_latency gauge
dump_lat_max_latency{idc="gz",method="post",status="200"} 0
dump_lat_max_latency{idc="tj",method="post",status="300"} 0
# HELP dump_lat_qps
# TYPE dump_lat_qps gauge
dump_lat_qps{idc="gz",method="post",status="200"} 0
dump_lat_qps{idc="tj",method="post",status="300"} 0
# HELP dump_lat_count
# TYPE dump_lat_count gauge
dump_lat_count{idc="gz",method="post",status="200"} 2
dump_lat_count{idc="tj",method="post",status="300"} 2

@ZhengweiZhu ZhengweiZhu marked this pull request as draft March 3, 2025 02:38
@ZhengweiZhu ZhengweiZhu force-pushed the fixPrometheusFmt branch 2 times, most recently from cb80cf6 to 14e911a Compare March 5, 2025 10:38
@ZhengweiZhu ZhengweiZhu marked this pull request as ready for review March 5, 2025 10:44
Fix PrometheusMetricsDumper to dump unique comment of a certain
metric name to follow the prometheus specification.
Also refine the prometheus ut case to support mvar output
@ZhengweiZhu
Copy link
Contributor Author

@chenBright There's an ut failure BthreadTest.trace which seems to be brought in by you recent commit?

Copy link
Contributor

@chenBright chenBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cc @wasphin

@ZhengweiZhu ZhengweiZhu requested a review from wasphin March 11, 2025 12:06
Copy link
Member

@wasphin wasphin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenBright chenBright merged commit b4ce61f into apache:master Mar 12, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants