feat(upsampling) - issues-stats API error upsampling support #94835

yuvmen · 2025-07-02T22:47:51Z

issues-stats now will return upsampled error counts for projects in the allowlist. Required opening a special case for the non standard query in both SnubaTSDB and the snuba SDK aliased_query.
The chosen implementation avoids opening them up to be too flexible which might introduce vulnerabilites, while staying as clean as possible.

Review note - Notice issues-stats returns both a total count and the graph of counts over a period of time, hence the two queries needed.
I had to improvise and get creative to open up paths to do what I needed here, I am not 100% sure about this solution. I think it strikes a good balance between safety and cleanliness, but feel free to suggest alternative approaches I may have missed.

issues-stats now will return upsampled error counts for projects in the allowlist. Required opening a special case for the non standard query in both SnubaTSDB and the snuba SDK `aliased_query`. The chosen implementation avoids opening them up to be too flexible which might introduce vulnerabilites, while staying as clean as possible.

codecov · 2025-07-02T23:04:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #94835     +/-   ##
=========================================
  Coverage   87.93%   87.94%             
=========================================
  Files       10381    10388      +7     
  Lines      601411   603227   +1816     
  Branches    23393    23393             
=========================================
+ Hits       528860   530505   +1645     
- Misses      72084    72255    +171     
  Partials      467      467

yuvmen · 2025-07-07T19:42:03Z

@sentry review

seer-by-sentry · 2025-07-07T19:42:08Z

On it! We are reviewing the PR and will provide feedback shortly.

yuvmen · 2025-07-07T19:42:56Z

@sentry generate-test

seer-by-sentry · 2025-07-07T19:43:02Z

On it! Sentry is generating unit tests for this PR.

seer-by-sentry · 2025-07-07T19:45:59Z

PR Description

This pull request introduces error upsampling, a mechanism to correct for client-side error sampling by adjusting the count of errors based on the client's sample rate. This ensures more accurate error reporting and analysis, especially when clients are configured to sample errors to reduce data volume.

Click to see more

Key Technical Changes

The key technical changes include: 1) A new function are_all_projects_error_upsampled to check if all projects in a query are allowlisted for error upsampling. 2) Modification of the Snuba query generation to use upsampled_count aggregation when error upsampling is enabled. 3) Introduction of get_upsampled_count_snql_with_alias to construct the SnQL query for calculating the upsampled count. 4) Updates to the Group and GroupStream serializers to incorporate the upsampled count in API responses. 5) Addition of a new constant UPSAMPLED_ERROR_AGGREGATION to define the aggregation type.

Architecture Decisions

The architectural decisions involve modifying the query generation process in Snuba to conditionally use the upsampled_count aggregation. This approach minimizes code duplication and allows for a seamless integration of the upsampling logic. The use of SnQL functions ensures that the upsampling calculation is performed efficiently within Snuba.

Dependencies and Interactions

This change depends on the snuba_sdk for constructing SnQL queries. It interacts with the options system to determine the allowlisted projects. It also affects the Group and GroupStream serializers, which are used to generate API responses for issue data. The changes also interact with the feature flagging system to enable/disable the error upsampling feature.

Risk Considerations

Potential risks include: 1) Performance impact of the upsampled_count aggregation on Snuba queries. 2) Incorrect upsampling if the client-side sample rate is not accurately reported. 3) Data inconsistencies if error upsampling is enabled/disabled without proper migration. 4) The removal of the organization parameter from are_all_projects_error_upsampled could break existing code if it relies on this parameter.

Notable Implementation Details

A notable implementation detail is the special handling of the upsampled_count aggregation in aliased_query_params within sentry/utils/snuba.py. This function is responsible for converting the aggregation into a selected column with the appropriate SnQL function structure. Also, the get_upsampled_count_snql_with_alias function constructs a complex nested SnQL function to calculate the upsampled count, which requires careful attention to ensure correctness.

seer-by-sentry · 2025-07-07T19:46:01Z

src/sentry/api/helpers/error_upsampling.py

+def are_all_projects_error_upsampled(project_ids: Sequence[int]) -> bool:
    """


The function signature change removes the organization parameter, but this breaks the contract. Consider providing a deprecation period or keeping backward compatibility by making the parameter optional with a default value of None. This ensures existing code won't break immediately.

Suggested change

def are_all_projects_error_upsampled(project_ids: Sequence[int]) -> bool:

"""

def are_all_projects_error_upsampled(

project_ids: Sequence[int], organization: Organization | None = None

) -> bool:

_{Did we get this right? 👍 / 👎 to inform future reviews.}

This is an internal helper method, we can freely break its contract since we control all of its usages

seer-by-sentry · 2025-07-07T19:46:03Z

src/sentry/utils/snuba.py

+from snuba_sdk import Column, DeleteQuery, Function, MetricsQuery, Request
 from snuba_sdk.legacy import json_to_snql


Missing import for SelectableExpression type. This should be added to the imports at the top of the file to avoid potential import errors.

Suggested change

from snuba_sdk import Column, DeleteQuery, Function, MetricsQuery, Request

from snuba_sdk.legacy import json_to_snql

from snuba_sdk.query import SelectableExpression

_{Did we get this right? 👍 / 👎 to inform future reviews.}

missed the fact the import does exist

seer-by-sentry · 2025-07-07T19:46:06Z

src/sentry/api/serializers/models/group_stream.py

        user=None,
+        aggregation_override: str | None = None,
        **kwargs,


The aggregation_override parameter is added to the abstract method signature, but there's no validation to ensure the override is a valid aggregation type. Consider adding validation to prevent runtime errors.

Suggested change

user=None,

aggregation_override: str | None = None,

**kwargs,

def query_tsdb(

self,

item_list: Sequence[Group],

query_params,

conditions=None,

environment_ids=None,

user=None,

aggregation_override: str | None = None,

**kwargs,

):

if aggregation_override and aggregation_override not in ['count', 'upsampled_count']:

raise ValueError(f"Invalid aggregation_override: {aggregation_override}")

pass

_{Did we get this right? 👍 / 👎 to inform future reviews.}

seer-by-sentry · 2025-07-07T19:46:08Z

src/sentry/api/serializers/models/group.py

+        # Check if all projects are allowlisted for error upsampling
+        is_upsampled = are_all_projects_error_upsampled(project_ids)
+        if is_upsampled:
+            aggregations[0] = ["upsampled_count", "", "times_seen"]


The check for error upsampling is performed but there's no error handling if the function call fails. Consider wrapping this in a try-catch block to ensure robustness.

Suggested change

# Check if all projects are allowlisted for error upsampling

is_upsampled = are_all_projects_error_upsampled(project_ids)

if is_upsampled:

aggregations[0] = ["upsampled_count", "", "times_seen"]

# Check if all projects are allowlisted for error upsampling

try:

is_upsampled = are_all_projects_error_upsampled(project_ids)

if is_upsampled:

aggregations[0] = ["upsampled_count", "", "times_seen"]

except Exception as e:

# Log the error but don't fail the entire query

logger.warning("Error checking upsampling allowlist", exc_info=e)

_{Did we get this right? 👍 / 👎 to inform future reviews.}

seer-by-sentry · 2025-07-07T19:46:10Z

src/sentry/tsdb/snuba.py

        aggregated_as = "aggregate"
-        aggregations: list[SelectableExpression] = [
-            Function(
-                aggregation,
-                [Column(model_aggregate)] if model_aggregate else [],
-                aggregated_as,
-            )
-        ]
+        if aggregation == "upsampled_count":
+            aggregations: list[SelectableExpression] = [
+                get_upsampled_count_snql_with_alias(aggregated_as)
+            ]
+        else:
+            aggregations = [
+                Function(
+                    function=aggregation,
+                    parameters=[Column(model_aggregate)] if model_aggregate else [],
+                    alias=aggregated_as,
+                )
+            ]



The aggregation logic creates different list structures based on the aggregation type, but this inconsistency could lead to bugs. Consider using a consistent pattern for all aggregation types.

Suggested change

aggregated_as = "aggregate"

aggregations: list[SelectableExpression] = [

Function(

aggregation,

[Column(model_aggregate)] if model_aggregate else [],

aggregated_as,

)

]

if aggregation == "upsampled_count":

aggregations: list[SelectableExpression] = [

get_upsampled_count_snql_with_alias(aggregated_as)

]

else:

aggregations = [

Function(

function=aggregation,

parameters=[Column(model_aggregate)] if model_aggregate else [],

alias=aggregated_as,

)

]

if aggregation == "upsampled_count":

aggregations: list[SelectableExpression] = [

get_upsampled_count_snql_with_alias(aggregated_as)

]

else:

aggregations = [

Function(

function=aggregation,

parameters=[Column(model_aggregate)] if model_aggregate else [],

alias=aggregated_as,

)

]

_{Did we get this right? 👍 / 👎 to inform future reviews.}

seer-by-sentry · 2025-07-07T19:46:12Z

src/sentry/utils/snuba.py

        for aggregation in aggregations:
            derived_columns.append(aggregation[2])

+            if aggregation[0] == UPSAMPLED_ERROR_AGGREGATION:
+                # Special-case: upsampled_count aggregation - this aggregation type
+                # requires special handling to convert it into a selected column
+                # with the appropriate SNQL function structure
+                if selected_columns is None:
+                    selected_columns = []
+                selected_columns.append(
+                    get_upsampled_count_snql_with_alias(
+                        aggregation[2]
+                        if len(aggregation) > 2 and aggregation[2] is not None
+                        else "upsampled_count"
+                    )


The aggregation processing logic uses hardcoded index aggregation[2] without checking if the aggregation tuple has enough elements. This could cause IndexError. Add bounds checking.

Suggested change

for aggregation in aggregations:

derived_columns.append(aggregation[2])

if aggregation[0] == UPSAMPLED_ERROR_AGGREGATION:

# Special-case: upsampled_count aggregation - this aggregation type

# requires special handling to convert it into a selected column

# with the appropriate SNQL function structure

if selected_columns is None:

selected_columns = []

selected_columns.append(

get_upsampled_count_snql_with_alias(

aggregation[2]

if len(aggregation) > 2 and aggregation[2] is not None

else "upsampled_count"

)

if aggregation[0] == UPSAMPLED_ERROR_AGGREGATION:

# Special-case: upsampled_count aggregation

if selected_columns is None:

selected_columns = []

alias = (

aggregation[2]

if len(aggregation) > 2 and aggregation[2] is not None

else "upsampled_count"

)

selected_columns.append(

get_upsampled_count_snql_with_alias(alias)

)

_{Did we get this right? 👍 / 👎 to inform future reviews.}

it does check the tuple

seer-by-sentry · 2025-07-07T19:46:15Z

tests/sentry/api/helpers/test_error_upsampling.py

    def test_are_all_projects_error_upsampled(self, mock_options: Mock) -> None:
        # Test when all projects are allowlisted
        mock_options.get.return_value = self.project_ids
-        assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is True
+        assert are_all_projects_error_upsampled(self.project_ids) is True

        # Test when some projects are not allowlisted
        mock_options.get.return_value = self.project_ids[:-1]
-        assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is False
+        assert are_all_projects_error_upsampled(self.project_ids) is False

        # Test when no projects are allowlisted
        mock_options.get.return_value = []
-        assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is False
+        assert are_all_projects_error_upsampled(self.project_ids) is False

        # Test when no project IDs provided
-        assert _are_all_projects_error_upsampled([], self.organization) is False
+        assert are_all_projects_error_upsampled([]) is False


The test is calling the old private function _are_all_projects_error_upsampled but the diff shows this function was renamed to are_all_projects_error_upsampled (public). Update the test to use the new public function name.

Suggested change

def test_are_all_projects_error_upsampled(self, mock_options: Mock) -> None:

# Test when all projects are allowlisted

mock_options.get.return_value = self.project_ids

assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is True

assert are_all_projects_error_upsampled(self.project_ids) is True

# Test when some projects are not allowlisted

mock_options.get.return_value = self.project_ids[:-1]

assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is False

assert are_all_projects_error_upsampled(self.project_ids) is False

# Test when no projects are allowlisted

mock_options.get.return_value = []

assert _are_all_projects_error_upsampled(self.project_ids, self.organization) is False

assert are_all_projects_error_upsampled(self.project_ids) is False

# Test when no project IDs provided

assert _are_all_projects_error_upsampled([], self.organization) is False

assert are_all_projects_error_upsampled([]) is False

@patch("sentry.api.helpers.error_upsampling.options")

def test_are_all_projects_error_upsampled(self, mock_options: Mock) -> None:

# Test when all projects are allowlisted

mock_options.get.return_value = self.project_ids

assert are_all_projects_error_upsampled(self.project_ids) is True

# Test when some projects are not allowlisted

mock_options.get.return_value = self.project_ids[:-1]

assert are_all_projects_error_upsampled(self.project_ids) is False

# Test when no projects are allowlisted

mock_options.get.return_value = []

assert are_all_projects_error_upsampled(self.project_ids) is False

# Test when no project IDs provided

assert are_all_projects_error_upsampled([]) is False

_{Did we get this right? 👍 / 👎 to inform future reviews.}

some bug here, it suggests the same as the PR version, and thinks the old version is something it is not

seer-by-sentry · 2025-07-07T19:46:17Z

tests/snuba/api/endpoints/test_organization_group_index_stats.py

+        """Test that count is upsampled for allowlisted projects in group index stats."""
+        with self.options({"issues.client_error_sampling.project_allowlist": [self.project.id]}):
+            project = self.project
+            event_data = {
+                "timestamp": before_now(seconds=30).isoformat(),
+                "message": "Error event for upsampling",
+                "contexts": {"error_sampling": {"client_sample_rate": 0.1}},
+            }
+            event = self.store_event(
+                data=event_data,
+                project_id=project.id,
+            )
+
+            group = event.group
+            self.login_as(user=self.user)
+
+            with self.feature("organizations:error-upsampling"):
+                response = self.get_response(query="is:unresolved", groups=[group.id])
+                assert response.status_code == 200
+                assert len(response.data) == 1
+                # Expect the count to be upsampled (1 / 0.1 = 10) - count is a string
+                assert response.data[0]["count"] == "10"
+                # Also check that lifetime stats are upsampled
+                assert response.data[0]["lifetime"]["count"] == "10"
+                # Also check that stats are upsampled, latest time bucket should contain upsampled event
+                assert response.data[0]["stats"]["24h"][-1][1] == 10


The test stores an event with a client_sample_rate of 0.1 and expects the count to be upsampled to 10, but there's no verification that the upsampling logic actually uses the sample_weight field. Consider adding assertions to verify the sample_weight is correctly applied.

Suggested change

"""Test that count is upsampled for allowlisted projects in group index stats."""

with self.options({"issues.client_error_sampling.project_allowlist": [self.project.id]}):

project = self.project

event_data = {

"timestamp": before_now(seconds=30).isoformat(),

"message": "Error event for upsampling",

"contexts": {"error_sampling": {"client_sample_rate": 0.1}},

}

event = self.store_event(

data=event_data,

project_id=project.id,

)

group = event.group

self.login_as(user=self.user)

with self.feature("organizations:error-upsampling"):

response = self.get_response(query="is:unresolved", groups=[group.id])

assert response.status_code == 200

assert len(response.data) == 1

# Expect the count to be upsampled (1 / 0.1 = 10) - count is a string

assert response.data[0]["count"] == "10"

# Also check that lifetime stats are upsampled

assert response.data[0]["lifetime"]["count"] == "10"

# Also check that stats are upsampled, latest time bucket should contain upsampled event

assert response.data[0]["stats"]["24h"][-1][1] == 10

event_data = {

"timestamp": before_now(seconds=30).isoformat(),

"message": "Error event for upsampling",

"contexts": {"error_sampling": {"client_sample_rate": 0.1}},

# Explicitly set sample_weight to verify upsampling logic

"extra": {"sample_weight": 10.0}

}

_{Did we get this right? 👍 / 👎 to inform future reviews.}

even if its a reasonable suggestion to add an assertion for it (even though its not accurate) it suggests to delete the entire test and add only that, which makes no sense

seer-by-sentry · 2025-07-07T19:46:19Z

src/sentry/api/serializers/models/group_stream.py

        if self.stats_period and not self._collapse("stats"):
+            aggregation_override = None
+            if self.project_ids:
+                is_upsampled = are_all_projects_error_upsampled(self.project_ids)
+                if is_upsampled:
+                    aggregation_override = UPSAMPLED_ERROR_AGGREGATION
+


The project_ids are stored in self.project_ids in the constructor but this could be None. Add null checking before using it in get_attrs method.

Suggested change

if self.stats_period and not self._collapse("stats"):

aggregation_override = None

if self.project_ids:

is_upsampled = are_all_projects_error_upsampled(self.project_ids)

if is_upsampled:

aggregation_override = UPSAMPLED_ERROR_AGGREGATION

aggregation_override = None

if self.project_ids:

try:

is_upsampled = are_all_projects_error_upsampled(self.project_ids)

if is_upsampled:

aggregation_override = UPSAMPLED_ERROR_AGGREGATION

except Exception as e:

# Log warning but don't fail the query

logger.warning("Failed to check upsampling status", exc_info=e)

_{Did we get this right? 👍 / 👎 to inform future reviews.}

seer-by-sentry · 2025-07-07T19:46:21Z

src/sentry/utils/snuba.py

+def get_upsampled_count_snql_with_alias(alias: str) -> list[SelectableExpression]:
+    return Function(
+        function="toInt64",
+        parameters=[
+            Function(
+                function="sum",
+                parameters=[
+                    Function(
+                        function="ifNull",
+                        parameters=[Column(name="sample_weight"), 1],
+                        alias=None,
+                    )
+                ],
+                alias=None,
+            )
+        ],
+        alias=alias,
+    )


Consider adding documentation for the complex nested function structure that builds the upsampled count SNQL. This function creates a deeply nested structure that would benefit from clearer documentation.

Suggested change

def get_upsampled_count_snql_with_alias(alias: str) -> list[SelectableExpression]:

return Function(

function="toInt64",

parameters=[

Function(

function="sum",

parameters=[

Function(

function="ifNull",

parameters=[Column(name="sample_weight"), 1],

alias=None,

)

],

alias=None,

)

],

alias=alias,

)

def get_upsampled_count_snql_with_alias(alias: str) -> SelectableExpression:

"""

Creates a SnQL expression for upsampled count aggregation.

This builds: toInt64(sum(ifNull(sample_weight, 1)))

Which sums the sample_weight column, defaulting to 1 if null,

then converts the result to a 64-bit integer.

Args:

alias: The alias to assign to the resulting expression

Returns:

A SnQL Function expression for upsampled counting

"""

return Function(

function="toInt64",

parameters=[

Function(

function="sum",

parameters=[

Function(

function="ifNull",

parameters=[Column(name="sample_weight"), 1],

alias=None,

)

],

alias=None,

)

],

alias=alias,

)

_{Did we get this right? 👍 / 👎 to inform future reviews.}

seer-by-sentry · 2025-07-07T19:47:51Z

Sentry has determined that unit tests are not necessary for this PR.

JoshFerge

makes sense to me.

sentry-io · 2025-07-09T13:37:35Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ SnubaError: Failed to parse snuba error response in de
‼️ UnqualifiedQueryError: Validation failed for entit... in prod
‼️ UnqualifiedQueryError: Validation failed for entit... in prod

_{Did you find this useful? React with a 👍 or 👎}

issues-stats now will return upsampled error counts for projects in the allowlist. Required opening a special case for the non standard query in both SnubaTSDB and the snuba SDK `aliased_query`. The chosen implementation avoids opening them up to be too flexible which might introduce vulnerabilities, while staying as clean as possible.

yuvmen requested a review from a team July 2, 2025 22:48

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 2, 2025

typing and minor fixes

26eb73f

vercel bot deployed to Preview July 7, 2025 18:27 View deployment

seer-by-sentry bot reviewed Jul 7, 2025

View reviewed changes

yuvmen marked this pull request as ready for review July 7, 2025 20:35

yuvmen requested review from a team as code owners July 7, 2025 20:35

JoshFerge approved these changes Jul 7, 2025

View reviewed changes

yuvmen merged commit 7b4c8df into master Jul 8, 2025
66 checks passed

yuvmen deleted the yuvmen/issues-stats-error-upsampling-support branch July 8, 2025 19:42

		def are_all_projects_error_upsampled(project_ids: Sequence[int]) -> bool:
		"""

		from snuba_sdk import Column, DeleteQuery, Function, MetricsQuery, Request
		from snuba_sdk.legacy import json_to_snql

	from snuba_sdk import Column, DeleteQuery, Function, MetricsQuery, Request
	from snuba_sdk.legacy import json_to_snql
	from snuba_sdk.query import SelectableExpression

-        user=None,
-        aggregation_override: str | None = None,
-        **kwargs,
+def query_tsdb(
+    self,
+    item_list: Sequence[Group],
+    query_params,
+    conditions=None,
+    environment_ids=None,
+    user=None,
+    aggregation_override: str | None = None,
+    **kwargs,
+):
+    if aggregation_override and aggregation_override not in ['count', 'upsampled_count']:
+        raise ValueError(f"Invalid aggregation_override: {aggregation_override}")
+    pass

Uh oh!

feat(upsampling) - issues-stats API error upsampling support #94835

feat(upsampling) - issues-stats API error upsampling support #94835

Uh oh!

Conversation

yuvmen commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yuvmen commented Jul 7, 2025

Uh oh!

seer-by-sentry bot commented Jul 7, 2025

Uh oh!

yuvmen commented Jul 7, 2025

Uh oh!

seer-by-sentry bot commented Jul 7, 2025

Uh oh!

seer-by-sentry bot commented Jul 7, 2025

PR Description

Key Technical Changes

Architecture Decisions

Dependencies and Interactions

Risk Considerations

Notable Implementation Details

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot commented Jul 7, 2025

Uh oh!

JoshFerge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuvmen commented Jul 2, 2025 •

edited

Loading

codecov bot commented Jul 2, 2025 •

edited

Loading

yuvmen Jul 7, 2025 •

edited

Loading

sentry-io bot commented Jul 9, 2025 •

edited

Loading