Skip to content

fix(taskworker) Add metric to see how long we wait #94864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

markstory
Copy link
Member

I'm interested in understanding how much time we're losing to waiting on empty multiprocessing queues. This metric will help understand this, so we can tune workers more.

I'm interested in understanding how much time we're losing to waiting on
empty multiprocessing queues. This metric will help understand this, so
we can tune workers more.
@markstory markstory requested a review from a team as a code owner July 3, 2025 14:44
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 3, 2025
Copy link

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/taskworker/workerchild.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #94864    +/-   ##
========================================
  Coverage   87.89%   87.89%            
========================================
  Files       10440    10442     +2     
  Lines      603678   604023   +345     
  Branches    23505    23505            
========================================
+ Hits       530577   530902   +325     
- Misses      72734    72754    +20     
  Partials      367      367            

try:
# If the queue is empty, this could block for a second.
# We could be losing a bunch of throughput here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate. If this blocks, it means there's no work to be done anyways. E.g. the throughput can't be "lost" here. Pausing here isn't causing the worker to not do work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll trim that out. I'm interested to see if making this timeout shorter, and adjusting worker buffer sizes could help us get more throughput from workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants