-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only force ALTREP compact row names (i.e. from duckplyr) if requested #745
Conversation
I didn't look at the code still. I wonder if it would make sense to implement duckdb specifics using the variables pane extension mechanism. For instance by implementing custom |
crates/harp/src/data_frame.rs
Outdated
/// Compute the number of columns in a data frame | ||
/// | ||
/// This is easy, it's the length of the VECSXP | ||
pub fn df_n_col(x: SEXP) -> crate::Result<i32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have all of the type checking in the DataFrame
class instead? I feel like we're creating lots of different ways of accessing information. If possible, it would be nice if there was only one way of validating a dataframe and accessing its fields by instantiating DataFrame
.
The rows could be an option and there could be a method to materialise the rows.
Thanks for looking into it. Calling Why is it hard to handle ALTREP failures? Does this change mean that the kernel won't crash for a stingy duck frame unless the user clicks a button? Could this be worked around with calling |
@krlmlr why does an ALTREP access generate an error? That seems weird to be since you'd normally expect (e.g.) |
This may happen for any number of reasons, including out-of-memory conditions. The "prudence" feature tries to avoid these OOM conditions and makes the errors more visible. Can you think of an alternative? |
05e2b1d
to
f2ec84f
Compare
So they are at least tied to the `DataFrame` struct
Remove footgun of `r_names()` which didn't protect
5044862
to
ec35847
Compare
Addresses posit-dev/positron#4158 at the request of @hadley (but not the part about errors during materialization crashing the kernel, that is a much deeper and harder to fix problem that requires us to adjust our assumptions about what a simple
INTEGER_ELT()
call can do - i.e. in an ALTREP world that can run arbitrary code that can error 😢)Joint PR with tidyverse/duckplyr#661
The problem is that duckplyr uses ALTREP compact row names as the
R_RowNamesSymbol
attribute. CallingINTEGER_ELT()
orINTEGER()
on this to get the number of rows will trigger the entire duckdb query to run, something we want to avoid. This is hard though, because that's how we determine the number of rows in the data frame. We've determined that rather than trying to carefully avoid ALTREP compact row names every time we look at the number of rows in a data frame, a better solution is to use @dfalbel's hooks for allowing S3 classes to provide custom Variables Pane implementations - avoiding these troublesome code paths entirely.Most of the work that was required for this is actually in tidyverse/duckplyr#661. However, in
variable_length()
here on the Ark side we were callingtable_info()
which queried the number of rows (bad) and number of columns even though we only needed the number of columns. I've reworked some things to expose ways to compute just the number of columns or just the number of rows for a data frame. With that in place, tidyverse/duckplyr#661 does the rest of the work.This won't trigger the duckdb query:
This will trigger the duckdb query, and we are ok with this, and this is similar to RStudio:
Here are some videos of me toying around with this:
Screen.Recording.2025-03-13.at.5.58.43.PM.mov
Screen.Recording.2025-03-13.at.5.59.48.PM.mov
See also, RStudio's
rs_dim()
here:https://github.com/rstudio/rstudio/blob/7a9ab7afe9ae006e60897596a189a904a716ec4f/src/cpp/session/modules/environment/SessionEnvironment.cpp#L536
https://github.com/rstudio/rstudio/blob/7a9ab7afe9ae006e60897596a189a904a716ec4f/src/cpp/session/modules/SessionEnvironment.R#L560