Skip to content

Releases: fastverse/collapse

collapse version 2.1.6

22 Dec 17:16
150a82a

Choose a tag to compare

  • The repo has moved to fastverse/collapse and the website to fastverse.org/collapse---for better visibility and maintenance. Appropriate redirects from the old repo/site have been implemented.
    Selected people now have access to the repo through the organization account and may respond to issues or submit fixes.

  • Added new AI-generated interactive/chattable DeepWiki documentation.

  • collapse now treats -0 and 0 as the same value in hash functions (funique(), group(), fmatch(), fndistinct(), fmode(), and all higher-level derivatives). This is implemented by adding a value of 0.0 to double values before hashing them, and has a small (~3%) performance penalty when hashing doubles. It is implemented in synch with an equivalent change in Rcpp. Thanks @mayer79 for reporting and helping with benchmarking the performance implications (#648).

  • Fixed a bug in pivot(..., how = "wider", FUN = "sum") (using internal sum function) when columns to aggregate were integer typed. Thanks @ummel (#803).

  • Fixed a bug in roworderv(..., neworder = indices), which segfaulted if indices were out of range. Thanks @JanMarvin (#807).

  • Faster installation from source thanks to the #include <Rcpp/Lighter> option in Rcpp which loads only part of the header files. Thanks @eddelbuettel for the hint.

  • Consistency with internal updates to data.table. Thanks @aitap (#809, Rdatatable/data.table#7497).

collapse version 2.1.5

19 Nov 02:55

Choose a tag to compare

  • Fixed small bugs/strange behavior in collap() when g was passed externally (as columns or GRP object). E.g., in collap(x, g, w = ~ col), where g is a GRP object, the weights were aggregated twice: once using FUN (incorrect) and once using wFUN.

collapse version 2.1.4

23 Oct 04:40
4bdf9d0

Choose a tag to compare

  • collapse now has a custom internal version of unlist() with better attribute preservation capabilities and a slight speed improvement. Thanks @aidanhorn (#785).

  • Fixes (#794) -- thanks @kendonB for reporting and making an effort to create a reprex.

collapse version 2.1.3

18 Aug 17:54

Choose a tag to compare

collapse version 2.1.2

24 May 16:40

Choose a tag to compare

  • na_insert() has new argument set to do this by reference.

  • Some moderate performance improvements to gsplit()/BY() and pivot().

collapse version 2.1.1

14 Apr 20:51
79803d7

Choose a tag to compare

  • alloc(list(1), 2) now gives list(1, 1) instead of list(list(1), list(1)), which can still be generated with alloc(list(1), 2, simplify = FALSE). This change also affects ftransform()/fmutate(), making, e.g., fmutate(data, y = list(1)) consistent with dplyr::mutate(data, y = list(1)). Thanks @MattAFiedler (#753).

  • fslice() now works with sf data frames.

collapse version 2.1.0

10 Mar 04:05
3e84d8e

Choose a tag to compare

collapse 2.1.0, released in March 2025, introduces a fast slicing function, an improved weighted quantile algorithm, a few convenience features, and removes some legacy functions from the package.

Potentially breaking changes

  • Functions pwNobs, as.factor_GRP, as.factor_qG, is.GRP, is.qG, is.unlistable, is.categorical, is.Date, as.numeric_factor, as.character_factor, and Date_vars, which were renamed in v1.6.0 by either replacing '.' with '_' or using all lower-case letters, and depreciated since then, are now finally removed from the package.

  • num_vars() (and thus also cat_vars() and collap()) were changed to a simpler C-definition of numeric data types which is more in-line with is.numeric(): is_numeric_C <- function(x) typeof(x) %in% c("integer", "double") && !inherits(x, c("factor", "Date", "POSIXct", "yearmon", "yearqtr")). The previous definition was: is_numeric_C_old <- function(x) typeof(x) %in% c("integer", "double") && (!is.object(x) || inherits(x, c("ts", "units", "integer64"))). Thus, the definition changed from including only certain classes to excluding the most important classes. Thanks @maouw for flagging this (#727).

Bug Fixes

  • Fixed some issues using collapse and the tidyverse together, particularly regarding tidyverse methods for 'grouped_df' - thanks @NicChr (#645).

  • More consistent handling of zero-length inputs - they are now also returned in fmean() and fmedian()/fnth() instead of returning NA (#628).

Additions

  • Added function fslice(): a fast alternative to dplyr::slice_[head|tail|min|max] that also works with matrices. Thanks @alinacherkas for the proposal and initial implementation (#725).

  • Added function groupv() as programmers version of group(), or rather, groupv() is now identical to the former group(), and group() now supports multiple vectors as input e.g. group(v1, v2). This is done for convenience and consistency with radixorder[v](). For backwards compatibility, group() also supports a single list as input.

  • join() has a new argument require allowing the user to generate messages or errors if the join operation is not successful enough:

join(df1, df2, require = list(x = 0.8, fail = "warning"))
#> Warning: Matched 75.0% of records in table df1 (x), but 80.0% is required
#> left join: df1[id1, id2] 3/4 (75%) <1:1st> df2[id1, id2] 3/4 (75%)
#>   id1 id2 name age salary      dept
#> 1   1   a John  35  60000        IT
#> 2   1   b Jane  28     NA      <NA>
#> 3   2   b  Bob  42  55000 Marketing
#> 4   3   c Carl  50  70000     Sales
  • psmat() now has a fill argument to fill empty slots in matrix/array with other elements (default NULL/NA).

Improvements

  • The weighted quantile algorithm in fquantile()/fnth() was improved to a more theoretically sound method following excellent notes by Matthew Kay. It now also supports quantile type 4, but it does not skip zero weights anymore, as the new algorithm makes it difficult to skip them 'on the fly'. Note that the existing collapse algorithm already had very good properties after a bug fix in v2.0.17, but the new algorithm is more exact and also faster.

  • The collapse arXiv article has been updated and significantly enhanced. It is an excellent resource to get an overview of the package.

Notes

  • On CRAN, collapse R dependency was changed to >= 4.1.0 to be able to use the base pipe in examples without generating a NOTE on R CMD check (another absolutely unnecessary restriction). The package depends on R >= 3.5.0 and the DESCRIPTION file on GitHub/R-universe will continue to reflect this.

collapse version 2.0.19

09 Jan 16:53
ee6f69f

Choose a tag to compare

collapse version 2.0.18

23 Nov 12:03
4c0501f

Choose a tag to compare

  • Cases in pivot(..., how = "longer") with no values columns now no longer give an error. Thanks @alvarocombo for flagging this (#663).

  • Fixed bug in qF(c(4L, 1L, NA), sort = FALSE): hash function failure due to a coding bug. Thanks @mayer79 for flagging this (#666).

  • If x is already a qG object of the right properties, calling qG(x) now does not copy x anymore. Thanks @mayer79 (mayer79/effectplots#11).

collapse version 2.0.17

02 Nov 21:24

Choose a tag to compare

  • In GRP.default(), the "group.starts" attribute is always returned, even if there is only one group or every observation is its own group. Thanks @JamesThompsonC (#631).

  • Fixed a bug in pivot() if na.rm = TRUE and how = "wider"|"recast" and there are multiple value columns with different missingness patterns. In this case na_omit(values) was applied with default settings to the original (long) value columns, implying potential loss of information. The fix applies na_omit(values, prop = 1), i.e., only removes completely missing rows.

  • qDF()/qDT()/qTBL() now allow a length-2 vector of names to row.names.col if X is a named atomic vector, e.g., qDF(fmean(mtcars), c("cars", "mean")) gives the same as pivot(fmean(mtcars, drop = FALSE), names = list("car", "mean")).

  • Added a subsection on using internal (ad-hoc) grouping to the collapse for tidyverse users vignette.

  • qsu() now adds a WeightSum column giving the sum of (non-zero or missing) weights if the w argument is used. Thanks @mayer79 for suggesting (#650). For panel data (pid) the 'Between' sum of weights is also simply the number of groups, and the 'Within' sum of weights is the 'Overall' sum of weights divided by the number of groups.

  • Fixed an inaccuracy in fquantile()/fnth() with weights: As per documentation the target sum is sumwp = (sum(w) - min(w)) * p, however, in practice, the weight of the minimum element of x was used instead of the minimum weight. Since the smallest element in the sample usually has a small weight this was unnoticed for a long while, but thanks to @Jahnic-kb now reported and fixed (#659).

  • Fixed a bug in recode_char() when regex = TRUE and the default argument was used. Thanks @alinacherkas for both reporing and fixing (#654).