Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ re-implement dplyr::{filter, mutate, arrange} using data parallelism #22

Open
4 tasks
jdhoffa opened this issue Dec 13, 2024 · 5 comments
Open
4 tasks
Labels
feature a feature request or enhancement

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Dec 13, 2024

See this dicussion:
#8

Acceptance criteria:

@jonocarroll
Copy link

One complication I'm interested in is how to support data.frame in general - there are some really funky edge cases that should probably be avoided; you can store a matrix in a data.frame column. You can store a list in a tibble column.

The {savvy} docs do mention working with Rust lists then converting back to data.frame once safely back in the world of R. That might have consequences for filtering a data.frame based on a calculation involving just one column. Perhaps a suitable approach would be calculating a boolean vector in Rust and still doing the actual filtering on the R side.

@jdhoffa
Copy link
Member Author

jdhoffa commented Dec 16, 2024

That is a super interesting point. One easy possibility (to begin with anyway) would be to only support more "standard" data-frame col-types, and error gracefully if we are faced with the funky edge cases?

@jdhoffa
Copy link
Member Author

jdhoffa commented Dec 16, 2024

Re: filter, the idea of only calculating the boolean vector in Rust makes a lot of sense.

@jdhoffa
Copy link
Member Author

jdhoffa commented Dec 16, 2024

Somewhat tangentially related, but I wonder if we should try to support larger-than-memory data structures too:
#26

@asbates
Copy link

asbates commented Dec 18, 2024

That is a super interesting point. One easy possibility (to begin with anyway) would be to only support more "standard" data-frame col-types, and error gracefully if we are faced with the funky edge cases?

I think this is probably the best approach. We can do the column type check in R and just never let weird columns enter Rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
Development

No branches or pull requests

3 participants