Skip to content

Follow-on work for median aggregate function #3040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andygrove opened this issue Aug 5, 2022 · 2 comments
Closed

Follow-on work for median aggregate function #3040

andygrove opened this issue Aug 5, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Follow-on issue to address feedback in #3009

In particular:

  • Run some performance test and see if it would be better to use builders in the aggregate state rather than arrays. Should we just use ScalarValue::List instead?
  • Should we have a configurable limit on the number of values that can be collected to prevent OOM?
  • Can we leverage the concat and take kernels?
  • Should we add an average kernel to arrow-rs?
  • Can we reduce memory overhead with some form of compression (such as run-length encoding or maintaining a map of unique values with counts)

Describe the solution you'd like
See above

Describe alternatives you've considered
None

Additional context
None

@jhorstmann
Copy link
Contributor

Another optimization is that you do not need to to sort all values, only the value at the middle index needs to be at the right place. Vec::select_nth_unstable does exactly that in O(n) using a quickselect algorithm.

@Dandandan
Copy link
Contributor

#7376 solves most of these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants