Skip to content

Conversation

@sven-rosenzweig
Copy link
Contributor

Adjust fingerprinting to reduce event noise for NSX-T API calls

Sentry groups events by fingerprint. This change customizes the fingerprinting
for NSX-T-related errors to avoid generating separate events for each API call
with a unique ID. The replace operation does not work in-place, so the adjustment
ensures consistent grouping.


Add metrics for API call durations and errors

This commit introduces Prometheus metrics to capture details about outgoing
API calls, including their duration and status codes. Additionally, a separate
to track request-level errors such as connection timeouts and network failures.
These metrics provide better observability into API behavior and help identify
performance or reliability issues.

@sven-rosenzweig sven-rosenzweig requested a review from mutax July 21, 2025 15:16
Sentry groups events by fingerprint. This change customizes the fingerprinting
for NSX-T-related errors to avoid generating separate events for each API call
with a unique ID. The replace operation does not work in-place, so the adjustment
ensures consistent grouping.
@sven-rosenzweig sven-rosenzweig marked this pull request as ready for review July 21, 2025 15:21
@sven-rosenzweig sven-rosenzweig requested review from a team as code owners July 21, 2025 15:21
@github-actions
Copy link

github-actions bot commented Jul 21, 2025

Name                                                                      Stmts   Miss  Cover
---------------------------------------------------------------------------------------------
networking_nsxv3/api/rpc.py                                                 250    121    52%
networking_nsxv3/common/config.py                                            16      0   100%
networking_nsxv3/common/constants.py                                         23      0   100%
networking_nsxv3/common/locking.py                                           35     11    69%
networking_nsxv3/common/synchronization.py                                  378     72    81%
networking_nsxv3/db/db.py                                                   119     19    84%
networking_nsxv3/extensions/nsxtoperations.py                               110     29    74%
networking_nsxv3/extensions/nsxtpolicy.py                                    84      7    92%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/agent.py                   174     60    66%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/cli.py                     302    199    34%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/client_nsx.py              216     52    76%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/constants_nsx.py             6      0   100%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/extensions/firewall.py      27      0   100%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/provider.py                171     10    94%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/provider_nsx_policy.py     778    113    85%
networking_nsxv3/plugins/ml2/drivers/nsxv3/agent/realization.py             238     59    75%
networking_nsxv3/plugins/ml2/drivers/nsxv3/driver.py                        143     74    48%
networking_nsxv3/prometheus/exporter.py                                      21      5    76%
networking_nsxv3/services/logapi/drivers/nsxv3/driver.py                     41      1    98%
networking_nsxv3/services/qos/drivers/nsxv3/qos.py                           34      4    88%
networking_nsxv3/services/trunk/drivers/nsxv3/trunk.py                       71      3    96%
---------------------------------------------------------------------------------------------
TOTAL                                                                      3237    839    74%

This commit introduces Prometheus metrics to capture details about outgoing
API calls, including their duration and status codes. Additionally, a separate
to track request-level errors such as connection timeouts and network failures.
These metrics provide better observability into API behavior and help identify
performance or reliability issues.

for resource_type, path in paths:
classified = RetryPolicy._get_resource_type(path)
print(resource_type, classified)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print() could be removed

new_path = RetryPolicy._create_sentry_fingerprint(path=path, placeholder="<uuid>")
resource = RetryPolicy._get_resource_type(path)

if "histogram:nsxv3_agent_api_calls" == str(metric):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use if str(RetryPolicy.API_CALLS) == str(metrics): or probably even without the str()


@staticmethod
def _update_metric(metric, **kwargs):
path = kwargs.get("path", '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a lot of kwargs.get() with default arguments. How about removing the **kwargs and add them all explicitly with a default? It doesn't really look like you need the variability unspecific arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants