fix(deletions): Prevent timeouts when deleting GroupHash records #101545

seer-by-sentry · 2025-10-15T19:01:24Z

The issues comes from this block:

sentry/src/sentry/deletions/defaults/group.py

Lines 248 to 259 in a3a7717

    
           try: 
        
               if seer_deletion: 
        
                   # Tell seer to delete grouping records for these groups 
        
                   # It's low priority to delete the hashes from seer, so we don't want 
        
                   # any network errors to block the deletion of the groups 
        
                   hash_values = [gh[1] for gh in hashes_chunk] 
        
                   may_schedule_task_to_delete_hashes_from_seer(project_id, hash_values) 
        
           except Exception: 
        
               logger.warning("Error scheduling task to delete hashes from seer") 
        
           finally: 
        
               hash_ids = [gh[0] for gh in hashes_chunk] 
        
               GroupHash.objects.filter(id__in=hash_ids).delete()

The update is triggered because of this on_delete:

sentry/src/sentry/models/grouphashmetadata.py

Lines 116 to 118 in b1f684a

    
           seer_matched_grouphash = FlexibleForeignKey( 
        
               "sentry.GroupHash", related_name="seer_matchees", on_delete=models.SET_NULL, null=True 
        
           )

Currently, when we try to delete all the group hashes, we update the related group hash metadata first. This query ends up failing for taking longer than 30 seconds:

SQL: UPDATE "sentry_grouphashmetadata" SET "seer_matched_grouphash_id" = NULL WHERE "sentry_grouphashmetadata"."seer_matched_grouphash_id" IN (%s, ..., %s)

This can be resolved by deleting the group hash metadata rows before trying to delete the group hash rows. This will avoid the update statements altogether.

This fix was initially generated by Seer, however, the final fix is a complete different approach.

codecov · 2025-10-15T19:14:57Z

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/deletions/defaults/group.py	81.81%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #101545      +/-   ##
===========================================
- Coverage   80.98%    80.98%   -0.01%     
===========================================
  Files        8706      8706              
  Lines      387005    387142     +137     
  Branches    24548     24548              
===========================================
+ Hits       313413    313522     +109     
- Misses      73245     73273      +28     
  Partials      347       347

armenzg · 2025-10-16T17:11:37Z

src/sentry/deletions/defaults/group.py

 EVENT_CHUNK_SIZE = 10000
 GROUP_HASH_ITERATIONS = 10000
+# Batch size for nullifying group_hash_metadata.seer_matched_grouphash_id references to avoid database timeouts
+GROUP_HASH_METADATA_BATCH_SIZE = 10


We're seeing 100 hashes updates taking over 30 seconds long, thus, this should be good enough.

armenzg · 2025-10-16T17:12:21Z

src/sentry/deletions/defaults/group.py


        iterations += 1

+        if iterations == GROUP_HASH_ITERATIONS:


This is a drive-by change.

Nit: I'd move this out of the loop so we're only checking it once (after the loop has finished). Or might be clearer to have a did_break: bool flag that's set on break (and this check would be outside the loop with a if not did_break:

armenzg · 2025-10-16T18:53:23Z

src/sentry/deletions/defaults/group.py

+    for i in range(0, len(hash_ids), GROUP_HASH_METADATA_BATCH_SIZE):
+        batch = hash_ids[i : i + GROUP_HASH_METADATA_BATCH_SIZE]
+        GroupHashMetadata.objects.filter(
+            seer_matched_grouphash_id__in=batch, seer_matched_grouphash_id__isnull=False


Using seer_matched_grouphash_id__isnull=False reduces the number of rows that need updating.

How does this reduce further beyond what the seer_matched_grouphash_id__in=batch filter is already doing? Does batch contain None / null values?

armenzg · 2025-10-16T18:53:43Z

src/sentry/options/defaults.py

 register(
    "deletions.group-hashes-batch-size",
-    default=10000,
+    default=100,


options automator uses this value.

armenzg · 2025-10-16T19:38:46Z

tests/sentry/deletions/test_group.py

                "args": [self.project.id, error_group_hashes, 0]
            }

+    def test_batch_nullify_seer_matched_grouphash_references(self) -> None:


If you read the test, it looks like what I'm doing makes sense, however, I'm not entirely sure if this the way I'm associating the hashes and metadata is correct. I have some other changes locally which also don't convince me.

I would like to still get this in as the code changes are obvious.

I will spend time talking with @lobsterkatie next week to see if what I'm doing makes sense.

The closest explanation I have is this:
#83081 (comment)

armenzg · 2025-10-16T19:39:41Z

tests/sentry/deletions/test_group.py

+
+        # Pretend that Seer tells us that grouphash B is similar to grouphash A
+        grouphash_b.metadata.seer_matched_grouphash = grouphash_a
+        grouphash_b.metadata.save()


This is the hack instead of doing something like what you see below to accomplish the same thing:

with mock.patch( "sentry.grouping.ingest.seer.get_seer_similar_issues" ) as mock_get_seer_similar_issues: # Let seer similarity return that grouphash_b is similar to grouphash_a mock_get_seer_similar_issues.return_value = (0.01, grouphash_a)

armenzg · 2025-10-16T19:40:17Z

tests/sentry/deletions/test_group.py

+
+        # Grouphash B's metadata should still exist, but the reference to A should be nullified
+        metadata_b = GroupHashMetadata.objects.get(id=metadata_b_id)
+        assert metadata_b.seer_matched_grouphash is None


It is now None while before it was assert grouphash_b.metadata.seer_matched_grouphash == grouphash_a.

seer-by-sentry · 2025-10-16T19:49:03Z

src/sentry/deletions/defaults/group.py

+        if iterations == GROUP_HASH_ITERATIONS:
+            metrics.incr("deletions.group_hashes.max_iterations_reached", sample_rate=1.0)
+            logger.warning(
+                "Group hashes batch deletion reached the maximum number of iterations. "
+                "Investigate if we need to change the GROUP_HASH_ITERATIONS value."
+            )


Potential bug: The reduced batch size with an unchanged iteration limit can cause delete_group_hashes to silently fail, leaving orphaned data for projects with over 1M GroupHash records.

Description: The batch size for GroupHash deletion was reduced from 10,000 to 100, but the iteration limit GROUP_HASH_ITERATIONS remains at 10,000. This lowers the maximum number of deletable hashes in a single run from 100 million to 1 million. When this new, lower limit is reached, such as during the deletion of a large project, the function logs a warning and exits without raising an error. This silent failure leaves orphaned GroupHash records in the database, as the calling function is unaware the deletion was incomplete.

Suggested fix: Increase the GROUP_HASH_ITERATIONS constant to compensate for the smaller batch size, for example, to 1,000,000, to maintain the previous capacity. Alternatively, raise an exception when the iteration limit is reached to prevent silent failures and allow the caller to handle the incomplete deletion.
_{severity: 0.7, confidence: 0.95}

_{Did we get this right? 👍 / 👎 to inform future reviews.}

thetruecpaul · 2025-10-16T19:57:55Z

src/sentry/deletions/defaults/group.py

+    for i in range(0, len(hash_ids), GROUP_HASH_METADATA_BATCH_SIZE):
+        batch = hash_ids[i : i + GROUP_HASH_METADATA_BATCH_SIZE]
+        GroupHashMetadata.objects.filter(
+            seer_matched_grouphash_id__in=batch, seer_matched_grouphash_id__isnull=False


How does this reduce further beyond what the seer_matched_grouphash_id__in=batch filter is already doing? Does batch contain None / null values?

thetruecpaul · 2025-10-16T20:03:54Z

src/sentry/deletions/defaults/group.py


        iterations += 1

+        if iterations == GROUP_HASH_ITERATIONS:


Nit: I'd move this out of the loop so we're only checking it once (after the loop has finished). Or might be clearer to have a did_break: bool flag that's set on break (and this check would be outside the loop with a if not did_break:

thetruecpaul · 2025-10-16T20:06:57Z

src/sentry/deletions/defaults/group.py

+    # and we need to nullify the seer_matched_grouphash_id field in the GroupHashMetadata model before deleting the GroupHash model
+    # to prevent the implicit ON DELETE SET NULL cascade from timing out.
+    # Process in small batches to avoid statement timeouts on high fan-out relationships
+    for i in range(0, len(hash_ids), GROUP_HASH_METADATA_BATCH_SIZE):


Wonder if it would be more performant — and worth changing — the loop in delete_group_hashes from:

qs = GroupHash.objects.filter(project_id=project_id, group_id__in=group_ids).values_list( "id", "hash" )[:hashes_batch_size] hashes_chunk = list(qs)

to something more like this, where we could do one big query and then divvy it up over the iterative loop. (Not blocking, just wondering.)

cursor · 2025-10-17T13:52:18Z

src/sentry/models/grouphash.py


-    __repr__ = sane_repr("group_id", "hash")
+    __repr__ = sane_repr("group_id", "hash", "metadata")
+    __str__ = __repr__


Bug: Circular Reference in GroupHash Representation

Adding metadata to GroupHash.__repr__ and seer_matched_grouphash to GroupHashMetadata.__repr__ creates a circular reference. This causes infinite recursion when these objects are string-represented, leading to stack overflow errors and unexpected database queries.

Additional Locations (1)

src/sentry/models/grouphashmetadata.py#L133-L135

The issues comes from this block: https://github.com/getsentry/sentry/blob/a3a771719d4777bd747d98fb05eb77c20425e3d6/src/sentry/deletions/defaults/group.py#L248-L259 The update is triggered because of this `on_delete`: https://github.com/getsentry/sentry/blob/b1f684a335128dbc74ad3a7fac1d7052df9e8f01/src/sentry/models/grouphashmetadata.py#L116-L118 Currently, when we try to delete all the group hashes, we update the related group hash metadata first. This query ends up failing for taking longer than 30 seconds: > SQL: UPDATE "sentry_grouphashmetadata" SET "seer_matched_grouphash_id" = NULL WHERE "sentry_grouphashmetadata"."seer_matched_grouphash_id" IN (%s, ..., %s) This can be resolved by deleting the group hash _metadata_ rows before trying to delete the group hash rows. This will avoid the update statement altogether. This fix was initially started in #101545, however, the solution has completely changed, thus, starting a new PR. Fixes [SENTRY-5ABJ](https://sentry.io/organizations/sentry/issues/6930113529/).

…1720) The issues comes from this block: https://github.com/getsentry/sentry/blob/a3a771719d4777bd747d98fb05eb77c20425e3d6/src/sentry/deletions/defaults/group.py#L248-L259 The update is triggered because of this `on_delete`: https://github.com/getsentry/sentry/blob/b1f684a335128dbc74ad3a7fac1d7052df9e8f01/src/sentry/models/grouphashmetadata.py#L116-L118 Currently, when we try to delete all the group hashes, we update the related group hash metadata first. This query ends up failing for taking longer than 30 seconds: > SQL: UPDATE "sentry_grouphashmetadata" SET "seer_matched_grouphash_id" = NULL WHERE "sentry_grouphashmetadata"."seer_matched_grouphash_id" IN (%s, ..., %s) This can be resolved by deleting the group hash _metadata_ rows before trying to delete the group hash rows. This will avoid the update statement altogether. This fix was initially started in #101545, however, the solution has completely changed, thus, starting a new PR. Fixes [SENTRY-5ABJ](https://sentry.io/organizations/sentry/issues/6930113529/).

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 15, 2025

vercel bot deployed to Preview October 15, 2025 19:05 View deployment

armenzg force-pushed the seer/fix-grouphash-deletion-timeout branch from efa76a4 to e7e5821 Compare October 16, 2025 14:57

vercel bot deployed to Preview October 16, 2025 14:59 View deployment

armenzg self-assigned this Oct 16, 2025

vercel bot deployed to Preview October 16, 2025 18:57 View deployment

armenzg reviewed Oct 16, 2025

View reviewed changes

armenzg marked this pull request as ready for review October 16, 2025 19:40

armenzg requested a review from a team as a code owner October 16, 2025 19:40

armenzg added the Trigger: getsentry tests Once code is reviewed: apply label to PR to trigger getsentry tests label Oct 16, 2025

armenzg requested review from lobsterkatie and markstory October 16, 2025 19:41

seer-by-sentry bot commented Oct 16, 2025

View reviewed changes

thetruecpaul reviewed Oct 16, 2025

View reviewed changes

seer-by-sentry bot and others added 11 commits October 17, 2025 08:27

fix(deletions): Prevent timeouts when deleting GroupHash records

f5b61e5

🛠️ apply pre-commit fixes

97b86bc

More changes

73e2406

All changes

53ae2b2

More precise

e719751

Remove comment

1e470d7

WIP

4241090

Test progressing

6d4453e

Address feedback

613422b

Minor options change

c1db05e

Change approach

06b7efa

armenzg force-pushed the seer/fix-grouphash-deletion-timeout branch from 18f02af to 06b7efa Compare October 17, 2025 13:49

armenzg requested a review from a team as a code owner October 17, 2025 13:49

github-actions bot removed the Trigger: getsentry tests Once code is reviewed: apply label to PR to trigger getsentry tests label Oct 17, 2025

vercel bot deployed to Preview October 17, 2025 13:51 View deployment

cursor bot reviewed Oct 17, 2025

View reviewed changes

armenzg mentioned this pull request Oct 17, 2025

fix(deletions): Prevent timeouts when deleting GroupHash records #101720

Merged

armenzg closed this Oct 20, 2025

armenzg deleted the seer/fix-grouphash-deletion-timeout branch October 20, 2025 13:01

	try:
	if seer_deletion:
	# Tell seer to delete grouping records for these groups
	# It's low priority to delete the hashes from seer, so we don't want
	# any network errors to block the deletion of the groups
	hash_values = [gh[1] for gh in hashes_chunk]
	may_schedule_task_to_delete_hashes_from_seer(project_id, hash_values)
	except Exception:
	logger.warning("Error scheduling task to delete hashes from seer")
	finally:
	hash_ids = [gh[0] for gh in hashes_chunk]
	GroupHash.objects.filter(id__in=hash_ids).delete()

	seer_matched_grouphash = FlexibleForeignKey(
	"sentry.GroupHash", related_name="seer_matchees", on_delete=models.SET_NULL, null=True
	)

Uh oh!

fix(deletions): Prevent timeouts when deleting GroupHash records #101545

fix(deletions): Prevent timeouts when deleting GroupHash records #101545

Uh oh!

Conversation

seer-by-sentry bot commented Oct 15, 2025 • edited by armenzg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Oct 17, 2025

Choose a reason for hiding this comment

Bug: Circular Reference in GroupHash Representation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seer-by-sentry bot commented Oct 15, 2025 •

edited by armenzg

Loading

codecov bot commented Oct 15, 2025 •

edited

Loading