Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample reporting as observer is sampling distribution #33780

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stankiewicz
Copy link
Contributor

For Dataflow V2, StateBackedIterable is iterated by readers after gbk shuffle. Examples are ParDo after GBK or merging combiners after GBK.

metrics.proto specifies that Sampling is used because calculating the byte count involves serializing the elements which is CPU intensive.

In case of StateBackedIterable sampling is not occurring which impacts performance of some of the pipelines that have expensive coders.

This change introduces sampling.

Fully fixes #33620 as previous fix was improvement.

@github-actions github-actions bot added the java label Jan 28, 2025
@stankiewicz stankiewicz marked this pull request as ready for review January 29, 2025 08:23
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: StateBackedIterable serializes elements size for every element when ComposedCombine is used
1 participant