-
Notifications
You must be signed in to change notification settings - Fork 133
Description
@jennybc and I have both extended tibble when creating custom subclasses (see googledrive, tune, rsample, and workflowsets). Our focus was on extending tibble to work with dplyr and vctrs, which generally follows the advice outlined in this rough help document.
Jenny mentioned that tibble could provide a hook to make it easier for package authors to extend tibble. This would work in a way that is very similar to dplyr_reconstruct()
, and tibble would automatically call this S3 generic hook at the end of functions that might invalidate the invariants of a tibble subclass. It would look like:
tibble_reconstruct <- function(x, to) {
UseMethod("tibble_reconstruct", to)
}
For example, if my subclass requires a special index
column, and [.tbl_df
drops that index column, then ideally the result would be a bare tibble, and would no longer inherit from my subclass. If tibble called tibble_reconstruct(x = out, to = x)
at the end of [.tbl_df
, then tibble_reconstruct.mysubclass(x, to)
could contain all of the logic required to decide if x
could be reconstructed to the class of to
, or if it should fall back to returning a bare tibble.
This is related to and probably supersedes @hadley's issue in #275.
This would simplify the advice given by dplyr, which currently suggests that [
and names<-
methods are required to be compatible with dplyr, along with a dplyr_reconstruct()
method. Instead, it could advise that if you are extending tibble, you only need a tibble_reconstruct()
method (we would still keep the current advice for the case where you are only extending data.frame).
This also aligns perfectly with the conventions that are already arising for adding support for vctrs and dplyr. We always start by creating mysubclass_maybe_reconstruct()
and mysubclass_is_reconstructable()
helpers which have all the logic for either reconstructing to a mysubclass or returning a bare tibble:
https://github.com/tidyverse/googledrive/blob/f2f090156236187b803c5ae26afb159bd4f78580/R/compat-dplyr.R#L1-L10
This gets used:
- As the implementation of the
vec_restore()
method - In
vec_ptype2()
andvec_cast()
methods - As the
dplyr_reconstruct()
method - In the methods required by dplyr, i.e. for
[
andnames<-
It would make sense for this to be the guts of the tibble_reconstruct()
method, then we could remove the [
and names<-
methods.
The following methods would need to call tibble_reconstruct()
:
[.tbl_df
[<-.tbl_df
$<-.tbl_df
[[<-.tbl_df
names<-.tbl_df
There may be others, but I think this would get us pretty far.
If all of this gets added, I could see two tibble vignettes coming out of this:
- Extending tibble
- Describing
tibble_reconstruct()
, what it is used for, and how to add a method - Encourage adding a standalone
mysubclass_reconstruct()
that gets used in thetibble_reconstruct()
method
- Describing
- Adding tibble subclass compat for dplyr and vctrs
- Assume that users read the above vignette and have
mysubclass_reconstruct()
ready to go - Talk about implementing
vec_restore()
,vec_ptype2()
, andvec_cast()
methods - Talk about implementing a
dplyr_reconstruct()
method
- Assume that users read the above vignette and have