Skip to content

Extending tibble: A case for tibble_reconstruct() #890

@DavisVaughan

Description

@DavisVaughan

@jennybc and I have both extended tibble when creating custom subclasses (see googledrive, tune, rsample, and workflowsets). Our focus was on extending tibble to work with dplyr and vctrs, which generally follows the advice outlined in this rough help document.

Jenny mentioned that tibble could provide a hook to make it easier for package authors to extend tibble. This would work in a way that is very similar to dplyr_reconstruct(), and tibble would automatically call this S3 generic hook at the end of functions that might invalidate the invariants of a tibble subclass. It would look like:

tibble_reconstruct <- function(x, to) {
  UseMethod("tibble_reconstruct", to)
}

For example, if my subclass requires a special index column, and [.tbl_df drops that index column, then ideally the result would be a bare tibble, and would no longer inherit from my subclass. If tibble called tibble_reconstruct(x = out, to = x) at the end of [.tbl_df, then tibble_reconstruct.mysubclass(x, to) could contain all of the logic required to decide if x could be reconstructed to the class of to, or if it should fall back to returning a bare tibble.

This is related to and probably supersedes @hadley's issue in #275.


This would simplify the advice given by dplyr, which currently suggests that [ and names<- methods are required to be compatible with dplyr, along with a dplyr_reconstruct() method. Instead, it could advise that if you are extending tibble, you only need a tibble_reconstruct() method (we would still keep the current advice for the case where you are only extending data.frame).


This also aligns perfectly with the conventions that are already arising for adding support for vctrs and dplyr. We always start by creating mysubclass_maybe_reconstruct() and mysubclass_is_reconstructable() helpers which have all the logic for either reconstructing to a mysubclass or returning a bare tibble:
https://github.com/tidyverse/googledrive/blob/f2f090156236187b803c5ae26afb159bd4f78580/R/compat-dplyr.R#L1-L10

This gets used:

It would make sense for this to be the guts of the tibble_reconstruct() method, then we could remove the [ and names<- methods.


The following methods would need to call tibble_reconstruct():

  • [.tbl_df
  • [<-.tbl_df
  • $<-.tbl_df
  • [[<-.tbl_df
  • names<-.tbl_df

There may be others, but I think this would get us pretty far.

If all of this gets added, I could see two tibble vignettes coming out of this:

  • Extending tibble
    • Describing tibble_reconstruct(), what it is used for, and how to add a method
    • Encourage adding a standalone mysubclass_reconstruct() that gets used in the tibble_reconstruct() method
  • Adding tibble subclass compat for dplyr and vctrs
    • Assume that users read the above vignette and have mysubclass_reconstruct() ready to go
    • Talk about implementing vec_restore(), vec_ptype2(), and vec_cast() methods
    • Talk about implementing a dplyr_reconstruct() method

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions