From 138f2f62591a1975226e63387bad276d714ccdf8 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 20:31:51 -0500 Subject: [PATCH 01/10] Add NEWS.md for version history and remove NEWS.Rd --- NEWS.md | 254 +++++++++++++++++++++++++++++++++ inst/NEWS.Rd | 395 --------------------------------------------------- 2 files changed, 254 insertions(+), 395 deletions(-) create mode 100644 NEWS.md delete mode 100644 inst/NEWS.Rd diff --git a/NEWS.md b/NEWS.md new file mode 100644 index 0000000..0e3142a --- /dev/null +++ b/NEWS.md @@ -0,0 +1,254 @@ +# kit 0.0.20 (2025-04-17) + +### Notes + +- Update copyright date in c files + +- Fix note on CRAN regarding Rf_isFrame + +# kit 0.0.19 (2024-09-07) + +### Bug Fixes + +- Fix multiple warnings in C code. + +# kit 0.0.18 (2024-06-06) + +### Bug Fixes + +- Fix `iif` tests for new version of R. + +# kit 0.0.17 (2024-05-03) + +### Bug Fixes + +- Fix `nswitch`. Thanks to Sebastian Krantz for raising an issue. + +### Notes + +- Update copyright date in c files + +- Fix note on CRAN regarding SETLENGTH + +# kit 0.0.16 (2024-03-01) + +### Notes + +- Check if `"kit.nThread"` is defined before setting it to `1L` + +# kit 0.0.15 (2023-10-01) + +### Notes + +- Correct typo in configure file + +# kit 0.0.14 (2023-08-12) + +### Notes + +- Update configure file to extend support for GCC + +- Correct warnings in NEWS.Rd (strong) + +- Correct typo in funique.Rd thanks to @davidbudzynski + +# kit 0.0.13 (2023-02-24) + +### Notes + +- Function `pprod` now returns double output even if inputs are integer - in line with `base::prod` - to avoid integer overflows. + +- Update configure file + +# kit 0.0.12 (2022-10-26) + +### New Features + +- Function `pcountNA` is equivalent to `pcount(..., value = NA)`. + +- Function `pcountNA` and `pcount(..., value = NA)` allow `NA` counting with mixed data type (including `data.frame`). `pcountNA` also supports list-vectors as inputs and counts empty or `NULL` elements as `NA`. + +- Functions `panyv`, `panyNA`, `pallv` and `pallNA` are added as efficient wrappers around `pcount` and `pcountNA`. They are parallel equivalents of scalar functions `base::anyNA` and `anyv`, `allv` and `allNA` in the 'collapse' R package. + +- Functions `pfirst` and `plast` are added to efficiently obtain the row-wise first and last non-missing value or non-empty element of lists. They are parallel equivalents to the (column-wise) `ffirst` and `flast` functions in the 'collapse' R package. Implemented by @SebKrantz. + +- Functions `psum/pprod/pmean` also support logical vectors as input. Implemented by @SebKrantz. + +### Bug Fixes + +- Function `charToFact` was not returning proper results. Thanks to @alex-raw for raising an issue. + +### Notes + +- Function `pprod` now returns double output even if inputs are integer - in line with `base::prod` - to avoid integer overflows. + +- C compiler warnings on CRAN R-devel caused by compilation with -Wstrict-prototypes are now fixed. Declaration of functions without prototypes is depreciated in all versions of C. Thanks to Sebastian Krantz for the PR. + +# kit 0.0.11 (2022-03-19) + +### New Features + +- Function `pcount` now supports data.frame. + +### Bug Fixes + +- Function `pcount` now works with specific NA values, i.e. NA_real_, NA_character_ etc... + +# kit 0.0.10 (2021-11-28) + +### New Features + +- Function `psum`, `pmean`, `pprod`, `pany` and `pall` now support lists. Thanks to Sebastian Krantz for the request and code suggestion. + +### Bug Fixes + +- Function `topn` should now work for ALTREP object. Thanks to @ben-schwen for raising an issue. + +# kit 0.0.9 (2021-09-12) + +### Notes + +- Re-organise header to prevent compilation errors with new version of Clang due to conflicts between R C headers and OpenMP. + +# kit 0.0.8 (2021-08-21) + +### New Features + +- Function `funique` now preserves the attributes if the input is a `data.table`, `tibble` or similar objects. Thanks to Sebastian Krantz for the request. + +- Function `topn` now defaults to base R `order` for large value of `n`. Please see updated documentation for more information `?kit::topn`. + +- Function `charToFact` gains a new argument `addNA=TRUE` to be used to include (or not) `NA` in levels of the output. + +- Function `shareData`, `getData` and `clearData` implemented to share data objects between R sessions. These functions are experimental and might change in the future. Feedback is welcome. Please see `?kit::shareData` for more information. + +### Notes + +- Few `calloc` functions at C level have been replaced by R C API function `Calloc` to avoid valgrind errors/warnings in Travis CI. + +- Errors reported by `rchk` on CRAN have been fixed. + +# kit 0.0.7 (2021-03-07) + +### New Features + +- Function `charToFact` gains a new argument `decreasing=FALSE` to be used to order levels of the output in decreasing or increasing order. + +- Function `topn` gains a new argument `index=TRUE` to be used return index (`TRUE`) or values (`FALSE`) of input vector. + +### Bug Fixes + +- Some tests of memory access errors using valgrind and AddressSanitizer were reported by CRAN. An attempt to fix these errors has been submitted as part of this package version. It also seems that these same errors were causing some tests to fail for `funique` and `psort` on some platforms. + +### Notes + +- Functions `pmean`, `pprod` and `psum` will result in error if used with factors. Documentation has been updated. + +# kit 0.0.6 (2021-02-21) + +### New Features + +- Function `funique` and `fduplicated` gain an additional argument `fromLast=FALSE` to indicate whether the search should start from the end or beginning [PR#11](https://github.com/2005m/kit/pull/11). + +- Functions `pall`, `pany`, `pmean`, `pprod` and `psum` accept `data.frame` as input [PR#15](https://github.com/2005m/kit/pull/15). Please see documentation for more information. + +- Function `charToFact` is equivalent to to base R `as.factor` but is much quicker and only converts character vector to factor. Note that it is parallelised. For more details and benchmark please see `?kit::charToFact`. + +- Function `psort` is experimental and equivalent to to base R `sort` but is only for character vector. It can sort by "C locale" or by "R session locale". For more details and benchmark please see `?kit::psort`. + +### Notes + +- A few OpenMP directives were missing for functions `vswitch` and `nswitch` for character vectors. These have been added in [PR#12](https://github.com/2005m/kit/pull/12). + +- Function `funique` was not preserving attributes for character, logical and complex vectors/data.frames. Thanks to Sebastian Krantz (@SebKrantz) for bringing that to my attention. This has been fixed in [PR#13](https://github.com/2005m/kit/pull/13). + +- Functions `funique` and `uniqLen` should now be faster for `factor` and `logical` vectors [PR#14](https://github.com/2005m/kit/pull/14). + +# kit 0.0.5 (2020-11-21) + +### New Features + +- Function `uniqLen(x)` is equivalent to base R `length(unique(x))` and `uniqueN` in package [data.table](https://CRAN.R-project.org/package=data.table). Function `uniqLen`, implemented in C, supports vectors, `data.frame` and `matrix`. It should be faster than these functions. For more details and benchmark please see `?kit::uniqLen`. + +- Function `vswitch` now supports mixed encoding and gains an additional argument `checkEnc=TRUE`. Thanks to Xianying Tan (@shrektan) for the request and review [PR#7](https://github.com/2005m/kit/pull/7). + +- Function `nswitch` is a nested version of function `vswitch` and also supports mixed encoding. Please see please see `?kit::nswitch` for further details. Thanks to Xianying Tan (@shrektan) for the request and review [PR#10](https://github.com/2005m/kit/pull/10). + +### Notes + +- Small algorithmic improvement for functions `fduplicated`, `funique` and `countOccur` for `vectors`, `data.frame` and `matrix`. + +- A tests folder has been added to the source package to track coverage and bugs. + +### C-Level Facilities + +- Function `nif` has been split into two distinctive functions at C level, one has its arguments evaluated in a lazy way and is for R users and the other one (nifInternalR) is not lazy and is intended for usage at C level. + +# kit 0.0.4 (2020-07-21) + +### New Features + +- Function `countOccur(x)`, implemented in C, is comparable to `base` R function `table`. It returns a `data.frame` and is between 3 to 50 times faster. For more details, please see `?kit::countOccur`. + +- Functions `funique` and `fduplicated` now support matrices. Additionally, these two functions should also have better performance compare to previous release. + +- Functions `topn` has an additional argument `hasna=TRUE` to indicates whether data contains `NA` value or not. If the data does not contain `NA` values, the function should be faster. + +### C-Level Facilities + +- A few C functions have been added to subset `data.frame` and `matrix` as well as do other operations. These functions are not exported or visible to the user but might become available and callable at C level in the future. + +### Bug Fixes + +- Function `fpos` was not properly handling `NaN` and `NA` for complex and double. This should now be fixed. The function has also been changed in case the 'needle' and 'haysatck' are vectors so that a vector is returned. + +- Functions `funique` and `fduplicated` were not properly handling data containing `POSIX` data. This has now been fixed. + +# kit 0.0.3 (2020-06-21) + +### New Features + +- Functions `fduplicated(x)` and `funique(x)`, implemented in C, are comparable to `base` R functions `duplicated` and `unique`. For more details, please see `?kit::funique`. + +- Functions `psum` and `pprod` have now better performance for type double and complex. + +### Bug Fixes + +- Function `count(x, y)` now checks that `x` and `y` have the same class and levels. So does `pcount`. + +- Function `pmean` was not callable at C level because of a typo. This is now fixed. + +# kit 0.0.2 (2020-05-22) + +### New Features + +- Function `count(x, value)`, implemented in C, to simply count the number of times an element `value` occurs in a vector or in a list `x`. For more details, please see `?kit::count`. + +- Function `pmean(..., na.rm=FALSE)`, `pall(..., na.rm=FALSE)`, `pany(..., na.rm=FALSE)` and `pcount(..., value)`, implemented in C, are similar to already available function `psum` and `pprod`. These functions respectively apply base R functions `mean`, `all` and `any` element-wise. For more details, benchmarks and help, please see `?kit::pmean`. + +### Bug Fixes + +- Fix Solaris Unicode warnings for NEWS file. Benchmarks have been moved from the NEWS file to each function Rd file. + +- Fix some `NA` edge cases for `pprod` and `psum` so these functions behave more like base R function `prod` and `sum`. + +- Fix installation errors for version of R (<3.5.0). + +# kit 0.0.1 (2020-05-03) + +### Initial Release + +- Function `fpos(needle, haystack, all=TRUE, overlap=TRUE)`, implemented in C, is inspired by base function `which` when used in the following form `which(x == y, arr.ind =TRUE)`. Function `fpos` returns the index(es) or position(s) of a matrix/vector within a larger matrix/vector. Please see `?kit::fpos` for more details. + +- Function `iif(test, yes, no, na=NULL, tprom=FALSE, nThread=getOption("kit.nThread"))`, originally contributed as `fifelse` in package [data.table](https://CRAN.R-project.org/package=data.table), was moved to package kit to be developed independently. Unlike the current version of `fifelse`, `iif` allows type promotion like base function `ifelse`. For further details about the differences with `fifelse`, as well as `hutils::if_else` and `dplyr::if_else`, please see `?kit::iif`. + +- Function `nif(..., default=NULL)`, implemented in C, is inspired by *SQL CASE WHEN*. It is comparable to [dplyr](https://CRAN.R-project.org/package=dplyr) function `case_when` however it evaluates it arguments in a lazy way (i.e only when needed). Function `nif` was originally contributed as function `fcase` in the [data.table](https://CRAN.R-project.org/package=data.table) package but then moved to package kit so its development may resume independently. Please see `?kit::nif` for more details. + +- Function `pprod(..., na.rm=FALSE)` and `psum(..., na.rm=FALSE)`, implemented in C, are inspired by base function `pmin` and `pmax`. These new functions work only for integer, double and complex types and do not recycle vectors. Please see `?kit::psum` for more details. + +- Function `setlevels(x, old, new, skip_absent=FALSE)`, implemented in C, may be used to set levels of a factor object. Please see `?kit::setlevels` for more details. + +- Function `topn(vec, n=6L, decreasing=TRUE)`, implemented in C, returns the top largest or smallest `n` values for a given numeric vector `vec`. It is inspired by `dplyr::top_n` and equivalent to base functions order and sort in specific cases as shown in the documentation. Please see `?kit::topn` for more details. + +- Function `vswitch(x, values, outputs, default=NULL, nThread=getOption("kit.nThread"))`, implemented in C, is a vectorised version of `base` R function `switch`. This function can also be seen as a particular case of function `nif`. Please see `?kit::switch` for more details. + diff --git a/inst/NEWS.Rd b/inst/NEWS.Rd deleted file mode 100644 index 3370f1c..0000000 --- a/inst/NEWS.Rd +++ /dev/null @@ -1,395 +0,0 @@ -\name{NEWS} -\title{News for \R Package \pkg{kit}} -\encoding{UTF-8} - -\newcommand{\CRANpkg}{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}} - -\section{version 0.0.20 (2025-04-17)}{ - \subsection{Notes}{ - \itemize{ - \item Update copyright date in c files - - \item Fix note on CRAN regarding Rf_isFrame - } - } -} - -\section{version 0.0.19 (2024-09-07)}{ - \subsection{Bug Fixes}{ - \itemize{ - \item Fix multiple warnings in C code. - } - } -} - -\section{version 0.0.18 (2024-06-06)}{ - \subsection{Bug Fixes}{ - \itemize{ - \item Fix \code{iif} tests for new version of R. - } - } -} - -\section{version 0.0.17 (2024-05-03)}{ - \subsection{Bug Fixes}{ - \itemize{ - \item Fix \code{nswitch}. Thanks to Sebastian Krantz for raising an issue. - } - } - \subsection{Notes}{ - \itemize{ - \item Update copyright date in c files - - \item Fix note on CRAN regarding SETLENGTH - } - } -} - -\section{version 0.0.16 (2024-03-01)}{ - \subsection{Notes}{ - \itemize{ - \item Check if \code{"kit.nThread"} is defined before setting it to \code{1L} - } - } -} - -\section{version 0.0.15 (2023-10-01)}{ - \subsection{Notes}{ - \itemize{ - \item Correct typo in configure file - } - } -} - -\section{version 0.0.14 (2023-08-12)}{ - \subsection{Notes}{ - \itemize{ - \item Update configure file to extend support for GCC - - \item Correct warnings in NEWS.Rd (strong) - - \item Correct typo in funique.Rd thanks to @davidbudzynski - } - } -} - -\section{version 0.0.13 (2023-02-24)}{ - \subsection{Notes}{ - \itemize{ - \item Function \code{pprod} now returns double output even if inputs are integer - in line with \code{base::prod} - to avoid integer overflows. - - \item Update configure file - } - } -} - -\section{version 0.0.12 (2022-10-26)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{pcountNA} is equivalent to \code{pcount(..., value = NA)}. - - \item Function \code{pcountNA} and \code{pcount(..., value = NA)} allow \code{NA} counting with mixed data type (including \code{data.frame}). \code{pcountNA} also supports list-vectors as inputs and counts empty or \code{NULL} elements as \code{NA}. - - \item Functions \code{panyv}, \code{panyNA}, \code{pallv} and \code{pallNA} are added as efficient wrappers around \code{pcount} and \code{pcountNA}. They are parallel equivalents of scalar functions \code{base::anyNA} and \code{anyv}, \code{allv} and \code{allNA} in the 'collapse' R package. - - \item Functions \code{pfirst} and \code{plast} are added to efficiently obtain the row-wise first and last non-missing value or non-empty element of lists. They are parallel equivalents to the (column-wise) \code{ffirst} and \code{flast} functions in the 'collapse' R package. Implemented by @SebKrantz. - - \item Functions \code{psum/pprod/pmean} also support logical vectors as input. Implemented by @SebKrantz. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Function \code{charToFact} was not returning proper results. Thanks to @alex-raw for raising an issue. - } - } - \subsection{Notes}{ - \itemize{ - \item Function \code{pprod} now returns double output even if inputs are integer - in line with \code{base::prod} - to avoid integer overflows. - - \item C compiler warnings on CRAN R-devel caused by compilation with -Wstrict-prototypes are now fixed. Declaration of functions without prototypes is depreciated in all versions of C. Thanks to Sebastian Krantz for the PR. - } - } -} - -\section{version 0.0.11 (2022-03-19)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{pcount} now supports data.frame. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Function \code{pcount} now works with specific NA values, i.e. NA_real_, NA_character_ etc... - } - } -} - -\section{version 0.0.10 (2021-11-28)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{psum}, \code{pmean}, \code{pprod}, \code{pany} and \code{pall} now support lists. Thanks to Sebastian Krantz for the request and code suggestion. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Function \code{topn} should now work for ALTREP object. Thanks to @ben-schwen for raising an issue. - } - } -} - -\section{version 0.0.9 (2021-09-12)}{ - \subsection{Notes}{ - \itemize{ - \item Re-organise header to prevent compilation errors with new version of Clang due to conflicts between R C headers and OpenMP. - } - } -} - -\section{version 0.0.8 (2021-08-21)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{funique} now preserves the attributes if the input is a - \code{data.table}, \code{tibble} or similar objects. Thanks to Sebastian Krantz for the request. - - \item Function \code{topn} now defaults to base R \code{order} for large value of \code{n}. - Please see updated documentation for more information \code{?kit::topn}. - - \item Function \code{charToFact} gains a new argument \code{addNA=TRUE} to be used - to include (or not) \code{NA} in levels of the output. - - \item Function \code{shareData}, \code{getData} and \code{clearData} implemented - to share data objects between \R sessions. These functions are experimental and might change in the future. - Feedback is welcome. Please see \code{?kit::shareData} for more information. - } - } - \subsection{Notes}{ - \itemize{ - \item Few \code{calloc} functions at C level have been replaced by R C API function - \code{Calloc} to avoid valgrind errors/warnings in Travis CI. - - \item Errors reported by \code{rchk} on CRAN have been fixed. - } - } -} - -\section{version 0.0.7 (2021-03-07)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{charToFact} gains a new argument \code{decreasing=FALSE} to be used - to order levels of the output in decreasing or increasing order. - - \item Function \code{topn} gains a new argument \code{index=TRUE} to be used return - index (\code{TRUE}) or values (\code{FALSE}) of input vector. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Some tests of memory access errors using valgrind and AddressSanitizer were reported by CRAN. - An attempt to fix these errors has been submitted as part of this package version. It also seems that - these same errors were causing some tests to fail for \code{funique} and \code{psort} on some platforms. - } - } - \subsection{Notes}{ - \itemize{ - \item Functions \code{pmean}, \code{pprod} and \code{psum} will result - in error if used with factors. Documentation has been updated. - } - } -} - -\section{version 0.0.6 (2021-02-21)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{funique} and \code{fduplicated} gain an additional argument - \code{fromLast=FALSE} to indicate whether the search should start from the end or beginning - \href{https://github.com/2005m/kit/pull/11}{PR#11}. - - \item Functions \code{pall}, \code{pany}, \code{pmean}, - \code{pprod} and \code{psum} accept \code{data.frame} as input - \href{https://github.com/2005m/kit/pull/15}{PR#15}. Please see documentation for more - information. - - \item Function \code{charToFact} is equivalent to to base R \code{as.factor} but is much - quicker and only converts character vector to factor. Note that it is parallelised. For more - details and benchmark please see \code{?kit::charToFact}. - - \item Function \code{psort} is experimental and equivalent to to base R \code{sort} - but is only for character vector. It can sort by "C locale" or by "R session locale". - For more details and benchmark please see \code{?kit::psort}. - } - } - \subsection{Notes}{ - \itemize{ - \item A few OpenMP directives were missing for functions \code{vswitch} and - \code{nswitch} for character vectors. These have been added in - \href{https://github.com/2005m/kit/pull/12}{PR#12}. - - \item Function \code{funique} was not preserving attributes for character, logical and - complex vectors/data.frames. Thanks to Sebastian Krantz (@SebKrantz) for bringing that to my - attention. This has been fixed in \href{https://github.com/2005m/kit/pull/13}{PR#13}. - - \item Functions \code{funique} and \code{uniqLen} should now be faster for - \code{factor} and \code{logical} vectors \href{https://github.com/2005m/kit/pull/14}{PR#14}. - } - } -} - -\section{version 0.0.5 (2020-11-21)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{uniqLen(x)} is equivalent to base R \code{length(unique(x))} and - \code{uniqueN} in package \CRANpkg{data.table}. Function \code{uniqLen}, implemented in C, supports - vectors, \code{data.frame} and \code{matrix}. It should be faster than these functions. For more - details and benchmark please see \code{?kit::uniqLen}. - - \item Function \code{vswitch} now supports mixed encoding and gains an additional argument - \code{checkEnc=TRUE}. Thanks to Xianying Tan (@shrektan) for the request and review - \href{https://github.com/2005m/kit/pull/7}{PR#7}. - - \item Function \code{nswitch} is a nested version of function \code{vswitch} - and also supports mixed encoding. Please see please see \code{?kit::nswitch} for further details. - Thanks to Xianying Tan (@shrektan) for the request and review \href{https://github.com/2005m/kit/pull/10}{PR#10}. - } - } - \subsection{Notes}{ - \itemize{ - \item Small algorithmic improvement for functions \code{fduplicated}, \code{funique} - and \code{countOccur} for \code{vectors}, \code{data.frame} and \code{matrix}. - - \item A tests folder has been added to the source package to track coverage and bugs. - } - } - \subsection{C-Level Facilities}{ - \itemize{ - \item Function \code{nif} has been split into two distinctive functions at C level, - one has its arguments evaluated in a lazy way and is for R users and the other one (nifInternalR) - is not lazy and is intended for usage at C level. - } - } -} - -\section{version 0.0.4 (2020-07-21)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{countOccur(x)}, implemented in C, is comparable to \code{base} - \R function \code{table}. It returns a \code{data.frame} and is between 3 to 50 times faster. - For more details, please see \code{?kit::countOccur}. - - \item Functions \code{funique} and \code{fduplicated} now support matrices. - Additionally, these two functions should also have better performance compare to previous release. - - \item Functions \code{topn} has an additional argument \code{hasna=TRUE} to indicates whether - data contains \code{NA} value or not. If the data does not contain \code{NA} values, the function - should be faster. - } - } - \subsection{C-Level Facilities}{ - \itemize{ - \item A few C functions have been added to subset \code{data.frame} and \code{matrix} as well as - do other operations. These functions are not exported or visible to the user but might become - available and callable at C level in the future. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Function \code{fpos} was not properly handling \code{NaN} and \code{NA} for complex - and double. This should now be fixed. The function has also been changed in case the 'needle' and - 'haysatck' are vectors so that a vector is returned. - - \item Functions \code{funique} and \code{fduplicated} were not properly handling - data containing \code{POSIX} data. This has now been fixed. - } - } -} - -\section{version 0.0.3 (2020-06-21)}{ - \subsection{New Features}{ - \itemize{ - \item Functions \code{fduplicated(x)} and \code{funique(x)}, implemented in C, - are comparable to \code{base} \R functions \code{duplicated} and \code{unique}. For more details, - please see \code{?kit::funique}. - - \item Functions \code{psum} and \code{pprod} have now better performance for - type double and complex. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Function \code{count(x, y)} now checks that \code{x} and \code{y} have the same class and - levels. So does \code{pcount}. - - \item Function \code{pmean} was not callable at C level because of a typo. This is now fixed. - } - } -} - -\section{version 0.0.2 (2020-05-22)}{ - \subsection{New Features}{ - \itemize{ - \item Function \code{count(x, value)}, implemented in C, to simply count the number of times - an element \code{value} occurs in a vector or in a list \code{x}. For more details, please see - \code{?kit::count}. - - \item Function \code{pmean(..., na.rm=FALSE)}, \code{pall(..., na.rm=FALSE)}, - \code{pany(..., na.rm=FALSE)} and \code{pcount(..., value)}, implemented in C, - are similar to already available function \code{psum} and \code{pprod}. These - functions respectively apply base \R functions \code{mean}, \code{all} and \code{any} element-wise. - For more details, benchmarks and help, please see \code{?kit::pmean}. - } - } - \subsection{Bug Fixes}{ - \itemize{ - \item Fix Solaris Unicode warnings for NEWS file. Benchmarks have been moved from the NEWS file to - each function Rd file. - - \item Fix some \code{NA} edge cases for \code{pprod} and \code{psum} so these - functions behave more like base \R function \code{prod} and \code{sum}. - - \item Fix installation errors for version of R (<3.5.0). - } - } -} - -\section{version 0.0.1 (2020-05-03)}{ - \subsection{Initial Release}{ - \itemize{ - \item Function \code{fpos(needle, haystack, all=TRUE, overlap=TRUE)}, implemented in C, is - inspired by base function \code{which} when used in the following form - \code{which(x == y, arr.ind =TRUE}). Function \code{fpos} returns the index(es) or position(s) - of a matrix/vector within a larger matrix/vector. Please see \code{?kit::fpos} for more - details. - - \item Function \code{iif(test, yes, no, na=NULL, tprom=FALSE, nThread=getOption("kit.nThread"))}, - originally contributed as \code{fifelse} in package \CRANpkg{data.table}, was moved to package kit - to be developed independently. Unlike the current version of \code{fifelse}, \code{iif} allows - type promotion like base function \code{ifelse}. For further details about the differences - with \code{fifelse}, as well as \code{hutils::if_else} and \code{dplyr::if_else}, please see - \code{?kit::iif}. - - \item Function \code{nif(..., default=NULL)}, implemented in C, is inspired by - \emph{SQL CASE WHEN}. It is comparable to \CRANpkg{dplyr} function \code{case_when} however it - evaluates it arguments in a lazy way (i.e only when needed). Function \code{nif} was - originally contributed as function \code{fcase} in the \CRANpkg{data.table} package but then moved - to package kit so its development may resume independently. Please see \code{?kit::nif} for - more details. - - \item Function \code{pprod(..., na.rm=FALSE)} and \code{psum(..., na.rm=FALSE)}, - implemented in C, are inspired by base function \code{pmin} and \code{pmax}. These new - functions work only for integer, double and complex types and do not recycle vectors. Please - see \code{?kit::psum} for more details. - - \item Function \code{setlevels(x, old, new, skip_absent=FALSE)}, implemented in C, - may be used to set levels of a factor object. Please see \code{?kit::setlevels} for more details. - - \item Function \code{topn(vec, n=6L, decreasing=TRUE)}, implemented in C, returns the top - largest or smallest \code{n} values for a given numeric vector \code{vec}. It is inspired by - \code{dplyr::top_n} and equivalent to base functions order and sort in specific cases as shown - in the documentation. Please see \code{?kit::topn} for more details. - - \item Function \code{vswitch(x, values, outputs, default=NULL, nThread=getOption("kit.nThread"))} - , implemented in C, is a vectorised version of \code{base} \R function \code{switch}. This - function can also be seen as a particular case of function \code{nif}. Please see - \code{?kit::switch} for more details. - } - } -} From e19bb25ed085073f23537965d78e7c592eaaa7e0 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 20:32:04 -0500 Subject: [PATCH 02/10] Ignore Rstudio files. --- .Rbuildignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.Rbuildignore b/.Rbuildignore index 2ab0f9a..6361f5b 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -2,3 +2,5 @@ ^\.appveyor\.yml$ ^README\.md$ LICENSE +^.*\.Rproj$ +^\.Rproj\.user$ From d4babc4b27e5d6649d08b7feb3a5229ed05cdc67 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 20:57:44 -0500 Subject: [PATCH 03/10] Add pkgdown site. --- .Rbuildignore | 3 ++ .github/workflows/pkgdown.yaml | 49 ++++++++++++++++++++++ _pkgdown.yml | 76 ++++++++++++++++++++++++++++++++++ pkgdown/extra.css | 68 ++++++++++++++++++++++++++++++ 4 files changed, 196 insertions(+) create mode 100644 .github/workflows/pkgdown.yaml create mode 100644 _pkgdown.yml create mode 100644 pkgdown/extra.css diff --git a/.Rbuildignore b/.Rbuildignore index 6361f5b..b2b348c 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -4,3 +4,6 @@ LICENSE ^.*\.Rproj$ ^\.Rproj\.user$ +^_pkgdown\.yml$ +^docs$ +^pkgdown$ diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 0000000..bfc9f4d --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,49 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, master] + pull_request: + release: + types: [published] + workflow_dispatch: + +name: pkgdown.yaml + +permissions: read-all + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + permissions: + contents: write + steps: + - uses: actions/checkout@v4 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, local::. + needs: website + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + shell: Rscript {0} + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.5.0 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/_pkgdown.yml b/_pkgdown.yml new file mode 100644 index 0000000..78a313f --- /dev/null +++ b/_pkgdown.yml @@ -0,0 +1,76 @@ +url: https://fastverse.github.io/kit/ + +home: + title: Data Manipulation Functions Implemented in C + +template: + bootstrap: 5 + bootswatch: sandstone + theme: ayu-dark # Or: ayu-mirage + math-rendering: katex + bslib: + primary: "#1e2124" # "#202224" # "#242424" # "#003254" + code-color: "#004573" # "#9c0027" # "#004d80" # b3002d + gray-dark: "#3f464d" + +development: + mode: auto + +navbar: + title: kit + structure: + left: + - reference + - articles + - news + - blog + right: + - search + - github + components: + reference: + text: Documentation + href: reference/index.html + articles: + text: Vignettes + href: articles/index.html + news: + text: News + href: news/index.html + github: + icon: fa-github + href: https://github.com/fastverse/kit + aria-label: GitHub + + +reference: +- title: "Parallel Statistical Functions" + desc: "Vector-valued (statistical) functions operating in parallel over vectors passed as arguments, or a single list of vectors/data frame." +- contents: + - parallel-funs +- title: "Vectorised and Nested Switches" + desc: "Fast vectorized and nested switches." +- contents: + - iif + - nif + - vswitch/nswitch +- title: "Sorting" + desc: "Parallel sort for strings and partial sort (N largest/smallest)." +- contents: + - psort + - topn +- title: "Factors" + desc: "Fast character to factor conversion and changing factor levels by reference." +- contents: + - charToFact + - setlevels +- title: "Unique Values and Counts" + desc: "Fast duplicated and unique and count the number of times element(s) occur." +- contents: + - fduplicated/funique + - count +- title: "Miscellaneous" + desc: "Find a matrix position inside a larger matrix and share data between R sessions." +- contents: + - fpos + - shareData/getData/clearData diff --git a/pkgdown/extra.css b/pkgdown/extra.css new file mode 100644 index 0000000..522ed42 --- /dev/null +++ b/pkgdown/extra.css @@ -0,0 +1,68 @@ +.navbar-nav .nav-item > .nav-link { + margin-right: 10px; +} +.template-home img.logo { + width: 150px; +} +img.logo { + width: 150px; + margin-left: 30px; +} +.h1, .h2, .h3, h1, h2, h3 { + margin-top: 35px; + margin-bottom: 10px; +} +body { + font-size: 100%; +} +dd { + padding-left: 1.5rem !important; + margin-bottom: 0.5rem !important; +} +/* +p { + font-size: 0.875em; 14px/16=0.875em +} +*/ +.fa-bluesky { + font-family: "Font Awesome 6 Brands"; + font-weight: 400; +} +span.fa.fa-bluesky { + font-size: 15.5px; +} +@media screen and (min-width: 1000px) { + span.fa.fa-bluesky { + padding-left: 12px; + } +} +span.fa.fa-twitter { + font-size: 18px; +} +span.fa.fa-github { + font-size: 18px; + margin-right: 100px; +} +a { + color: #0089b3; /* #007da3 */ +} +a:hover { + color: #005873; /* #027ca1; */ +} +pre { + color: #cccccc; +} +small.nav-text.text-muted { + color: #999a9c !important; /* #8e8c84 #999a9c; -> Same as navbar */ +} + +.form-control, +.form-control::placeholder { + color: #999a9c !important; +} + +[data-bs-theme="dark"] { + --bs-body-color: #cccccc !important; + --bs-secondary-color: #cccccc !important; + --bs-tertiary-color: #999a9c !important; +} From 3e5fdd926eeb6950e328ff1439e0e3f1d29a0b49 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 20:57:57 -0500 Subject: [PATCH 04/10] Remove indents. --- man/count.Rd | 6 +++--- man/fpos.Rd | 8 ++++---- man/funique.Rd | 6 +++--- man/iif.Rd | 4 ++-- man/nif.Rd | 2 +- man/psort.Rd | 10 +++++----- man/psum.Rd | 40 ++++++++++++++++++++-------------------- man/shareData.Rd | 6 +++--- man/topn.Rd | 8 ++++---- man/vswitch.Rd | 12 ++++++------ 10 files changed, 51 insertions(+), 51 deletions(-) diff --git a/man/count.Rd b/man/count.Rd index 5748f26..9475cb8 100644 --- a/man/count.Rd +++ b/man/count.Rd @@ -7,9 +7,9 @@ Simple functions to count the number of times an element occurs. } \usage{ - count(x, value) - countNA(x) - countOccur(x) +count(x, value) +countNA(x) +countOccur(x) } \arguments{ \item{x}{ A vector or list for \code{countNA}. A vector for \code{count} and a vector or \code{data.frame} for \code{countOccur}.} diff --git a/man/fpos.Rd b/man/fpos.Rd index 455b77b..a89bb26 100644 --- a/man/fpos.Rd +++ b/man/fpos.Rd @@ -2,10 +2,10 @@ \alias{fpos} \title{ Find a matrix position inside a larger matrix } \description{ -The function \code{fpos} returns the locations (row and column index) where a small matrix may be found in a larger matrix. The function also works with vectors. +The function \code{fpos} returns the locations (row and column index) where a small matrix may be found in a larger matrix. The function also works with vectors. } \usage{ - fpos(needle, haystack, all=TRUE, overlap=TRUE) +fpos(needle, haystack, all=TRUE, overlap=TRUE) } \arguments{ \item{needle}{ A matrix or vector to search for in the larger matrix or vector \code{haystack}. Note that the \code{needle} dimensions (row and column size) must be smaller than the \code{haystack} dimensions. } @@ -24,10 +24,10 @@ small_matrix = matrix(c(14, 15, 24, 25), nrow = 2) fpos(small_matrix, big_matrix) -# Example 2: find a vector inside a larger one +# Example 2: find a vector inside a larger one fpos(14:15, 1:30) -# Example 3: +# Example 3: big_matrix = matrix(c(1:5), nrow = 10, ncol = 5) small_matrix = matrix(c(2:3), nrow = 2, ncol = 2) diff --git a/man/funique.Rd b/man/funique.Rd index 5933d36..6843ea0 100644 --- a/man/funique.Rd +++ b/man/funique.Rd @@ -7,9 +7,9 @@ Similar to base R functions \code{duplicated} and \code{unique}, \code{fduplicated} and \code{funique} are slightly faster for vectors and much faster for \code{data.frame}. Function \code{uniqLen} is equivalent to base R \code{length(unique)} or \code{data.table::uniqueN}. } \usage{ - fduplicated(x, fromLast = FALSE) - funique(x, fromLast = FALSE) - uniqLen(x) +fduplicated(x, fromLast = FALSE) +funique(x, fromLast = FALSE) +uniqLen(x) } \arguments{ \item{x}{ A vector, data.frame or matrix.} diff --git a/man/iif.Rd b/man/iif.Rd index 3426b45..b8991c2 100644 --- a/man/iif.Rd +++ b/man/iif.Rd @@ -5,7 +5,7 @@ \code{iif} is a faster and more robust replacement of \code{\link[base]{ifelse}}. It is comparable to \code{dplyr::if_else}, \code{hutils::if_else} and \code{data.table::fifelse}. It returns a value with the same length as \code{test} filled with corresponding values from \code{yes}, \code{no} or eventually \code{na}, depending on \code{test}. It does not support S4 classes. } \usage{ - iif(test, yes, no, na=NULL, tprom=FALSE, nThread=getOption("kit.nThread")) +iif(test, yes, no, na=NULL, tprom=FALSE, nThread=getOption("kit.nThread")) } \arguments{ \item{test}{ A logical vector. } @@ -17,7 +17,7 @@ \details{ In contrast to \code{\link[base]{ifelse}} attributes are copied from \code{yes} to the output. This is useful when returning \code{Date}, \code{factor} or other classes. Like \code{dplyr::if_else} and \code{hutils::if_else}, the \code{na} argument is by default set to \code{NULL}. This argument is set to \code{NA} in data.table::fifelse. -Similarly to \code{dplyr::if_else} and when \code{tprom=FALSE}, \code{iif} requires same type for arguments \code{yes} and \code{no}. This is not strictly the case for \code{data.table::fifelse} which will coerce integer to double. +Similarly to \code{dplyr::if_else} and when \code{tprom=FALSE}, \code{iif} requires same type for arguments \code{yes} and \code{no}. This is not strictly the case for \code{data.table::fifelse} which will coerce integer to double. When \code{tprom=TRUE}, \code{iif} behavior is similar to \code{base::ifelse} in the sense that it will promote or coerce \code{yes} and \code{no}to the "highest" used type. Note, however, that unlike \code{base::ifelse} attributes are still conserved. } \value{ diff --git a/man/nif.Rd b/man/nif.Rd index ea44941..8026d9b 100644 --- a/man/nif.Rd +++ b/man/nif.Rd @@ -5,7 +5,7 @@ \code{nif} is a fast implementation of SQL \code{CASE WHEN} statement for R. Conceptually, \code{nif} is a nested version of \code{\link{iif}} (with smarter implementation than manual nesting). It is not the same but it is comparable to \code{dplyr::case_when} and \code{data.table::fcase}. } \usage{ - nif(..., default=NULL) +nif(..., default=NULL) } \arguments{ \item{...}{ A sequence consisting of logical condition (\code{when})-resulting value (\code{value}) \emph{pairs} in the following order \code{when1, value1, when2, value2, ..., whenN, valueN}. Logical conditions \code{when1, when2, ..., whenN} must all have the same length, type and attributes. Each \code{value} may either share length with \code{when} or be length 1. Please see Examples section for further details.} diff --git a/man/psort.Rd b/man/psort.Rd index 1c11b4f..0861ecd 100644 --- a/man/psort.Rd +++ b/man/psort.Rd @@ -6,8 +6,8 @@ It is currently experimental and might change in the future. Use with caution. } \usage{ - psort(x, decreasing=FALSE, na.last=NA, - nThread=getOption("kit.nThread"),c.locale=TRUE) +psort(x, decreasing=FALSE, na.last=NA, + nThread=getOption("kit.nThread"),c.locale=TRUE) } \arguments{ \item{x}{ A vector of type character. If other, it will default to \code{base::sort}} @@ -31,12 +31,12 @@ identical(psort(x, c.locale=TRUE), sort(x, method="radix")) # strings = as.character(as.hexmode(1:1000)) # x = sample(strings, 1e8, replace=TRUE) # system.time({kit::psort(x, na.last = TRUE, nThread = 1L)}) -# user system elapsed +# user system elapsed # 2.833 0.434 3.277 # system.time({sort(x,method="radix",na.last = TRUE)}) -# user system elapsed +# user system elapsed # 5.597 0.559 6.176 # system.time({x[order(x,method="radix",na.last = TRUE)]}) -# user system elapsed +# user system elapsed # 5.561 0.563 6.143 } diff --git a/man/psum.Rd b/man/psum.Rd index ce3b1cf..825416e 100644 --- a/man/psum.Rd +++ b/man/psum.Rd @@ -14,22 +14,22 @@ \alias{plast} \title{Parallel (Statistical) Functions} \description{ -Vector-valued (statistical) functions operating in parallel over vectors passed as arguments, or a single list of vectors (such as a data frame). Similar to \code{\link{pmin}} and \code{\link{pmax}}, except that these functions do not recycle vectors. +Vector-valued (statistical) functions operating in parallel over vectors passed as arguments, or a single list of vectors (such as a data frame). Similar to \code{\link{pmin}} and \code{\link{pmax}}, except that these functions do not recycle vectors. } \usage{ - psum(..., na.rm = FALSE) - pprod(..., na.rm = FALSE) - pmean(..., na.rm = FALSE) - pfirst(...) # (na.rm = TRUE) - plast(...) # (na.rm = TRUE) - pall(..., na.rm = FALSE) - pallNA(...) - pallv(..., value) - pany(..., na.rm = FALSE) - panyNA(...) - panyv(..., value) - pcount(..., value) - pcountNA(...) +psum(..., na.rm = FALSE) +pprod(..., na.rm = FALSE) +pmean(..., na.rm = FALSE) +pfirst(...) # (na.rm = TRUE) +plast(...) # (na.rm = TRUE) +pall(..., na.rm = FALSE) +pallNA(...) +pallv(..., value) +pany(..., na.rm = FALSE) +panyNA(...) +panyv(..., value) +pcount(..., value) +pcountNA(...) } \arguments{ \item{...}{ suitable (atomic) vectors of the same length, or a single list of vectors (such as a \code{data.frame}). See Details on the allowed data types for each function, and Examples.} @@ -43,14 +43,14 @@ Functions \code{psum}, \code{pprod} work for integer, logical, double and comple \code{pany} and \code{pall} are derived from base functions \code{all} and \code{any} and only allow logical inputs. -\code{pcount} counts the occurrence of \code{value}, and expects arguments of the same data type (except for \code{value = NA}). \code{pcountNA} is equivalent to \code{pcount} with \code{value = NA}, and they both allow \code{NA} counting in mixed-type data. \code{pcountNA} additionally supports list vectors and counts empty or \code{NULL} elements as \code{NA}. +\code{pcount} counts the occurrence of \code{value}, and expects arguments of the same data type (except for \code{value = NA}). \code{pcountNA} is equivalent to \code{pcount} with \code{value = NA}, and they both allow \code{NA} counting in mixed-type data. \code{pcountNA} additionally supports list vectors and counts empty or \code{NULL} elements as \code{NA}. -Functions \code{panyv/pallv} are wrappers around \code{pcount}, and \code{panyNA/pallNA} are wrappers around \code{pcountNA}. They return a logical vector instead of the integer count. +Functions \code{panyv/pallv} are wrappers around \code{pcount}, and \code{panyNA/pallNA} are wrappers around \code{pcountNA}. They return a logical vector instead of the integer count. None of these functions recycle vectors i.e. all input vectors need to have the same length. All functions support long vectors with up to \code{2^64-1} elements. } \value{ -\code{psum/pprod/pmean} return the sum, product or mean of all arguments. The value returned will be of the highest argument type (integer < double < complex). \code{pprod} only returns double or complex. \code{pall[v/NA]} and \code{pany[v/NA]} return a logical vector. \code{pcount[NA]} returns an integer vector. \code{pfirst/plast} return a vector of the same type as the inputs. +\code{psum/pprod/pmean} return the sum, product or mean of all arguments. The value returned will be of the highest argument type (integer < double < complex). \code{pprod} only returns double or complex. \code{pall[v/NA]} and \code{pany[v/NA]} return a logical vector. \code{pcount[NA]} returns an integer vector. \code{pfirst/plast} return a vector of the same type as the inputs. } \seealso{ Package 'collapse' provides column-wise and scalar-valued analogues to many of these functions. @@ -61,7 +61,7 @@ x = c(1, 3, NA, 5) y = c(2, NA, 4, 1) z = c(3, 4, 4, 1) -# Example 1: psum +# Example 1: psum psum(x, y, z, na.rm = FALSE) psum(x, y, z, na.rm = TRUE) @@ -105,7 +105,7 @@ pmean(iris[,1:2]) # x = rnorm(n) # 763 Mb # y = rnorm(n) # z = rnorm(n) -# +# # microbenchmark::microbenchmark( # kit=psum(x, y, z, na.rm = TRUE), # base=rowSums(do.call(cbind,list(x, y, z)), na.rm=TRUE), @@ -119,7 +119,7 @@ pmean(iris[,1:2]) # x = sample(c(TRUE, FALSE, NA), n, TRUE) # 382 Mb # y = sample(c(TRUE, FALSE, NA), n, TRUE) # z = sample(c(TRUE, FALSE, NA), n, TRUE) -# +# # microbenchmark::microbenchmark( # kit=pany(x, y, z, na.rm = TRUE), # base=sapply(1:n, function(i) any(x[i],y[i],z[i],na.rm=TRUE)), diff --git a/man/shareData.Rd b/man/shareData.Rd index bb9fce2..c2fbe8b 100644 --- a/man/shareData.Rd +++ b/man/shareData.Rd @@ -7,9 +7,9 @@ Experimental functions that enable the user to share a R object between 2 \R sessions. } \usage{ - shareData(data, map_name, verbose=FALSE) - getData(map_name, verbose=FALSE) - clearData(x, verbose=FALSE) +shareData(data, map_name, verbose=FALSE) +getData(map_name, verbose=FALSE) +clearData(x, verbose=FALSE) } \arguments{ \item{data}{ A \R object like a vector or a \code{data.frame}.} diff --git a/man/topn.Rd b/man/topn.Rd index 075c774..7dcb791 100644 --- a/man/topn.Rd +++ b/man/topn.Rd @@ -2,12 +2,12 @@ \alias{topn} \title{ Top N values index} \description{ - \code{topn} is used to get the indices of the few values of an input. This is an extension of \code{\link{which.max}}/\code{\link{which.min}} which provide \emph{only} the first such index. - + \code{topn} is used to get the indices of the few values of an input. This is an extension of \code{\link{which.max}}/\code{\link{which.min}} which provide \emph{only} the first such index. + The output is the same as \code{order(vec)[1:n]}, but internally optimized not to sort the irrelevant elements of the input (and therefore much faster, for small \code{n} relative to input size). } \usage{ - topn(vec, n=6L, decreasing=TRUE, hasna=TRUE, index=TRUE) +topn(vec, n=6L, decreasing=TRUE, hasna=TRUE, index=TRUE) } \arguments{ \item{vec}{ A numeric vector of type numeric or integer. Other types are not supported yet. } @@ -23,7 +23,7 @@ \examples{ x = rnorm(1e4) -# Example 1: index of top 6 negative values +# Example 1: index of top 6 negative values topn(x, 6L, decreasing=FALSE) order(x)[1:6] diff --git a/man/vswitch.Rd b/man/vswitch.Rd index 409d646..0ee33ff 100644 --- a/man/vswitch.Rd +++ b/man/vswitch.Rd @@ -6,12 +6,12 @@ \code{vswitch}/ \code{nswitch} is a vectorised version of \code{base} function \code{switch}. This function can also be seen as a particular case of function \code{nif}, as shown in examples below, and should also be faster. } \usage{ - vswitch(x, values, outputs, default=NULL, - nThread=getOption("kit.nThread"), - checkEnc=TRUE) - nswitch(x, ..., default=NULL, - nThread=getOption("kit.nThread"), - checkEnc=TRUE) +vswitch(x, values, outputs, default=NULL, + nThread=getOption("kit.nThread"), + checkEnc=TRUE) +nswitch(x, ..., default=NULL, + nThread=getOption("kit.nThread"), + checkEnc=TRUE) } \arguments{ \item{x}{A vector or list.} From 8e754cc93a3de2fc4376cf0ddfd3374c6a8bcee8 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 20:58:24 -0500 Subject: [PATCH 05/10] Change URL + add website url. --- DESCRIPTION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 1b2e657..39e5107 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -8,10 +8,11 @@ Authors@R: c(person("Morgan", "Jacob", role = c("aut", "cre", "cph"), email = "m Author: Morgan Jacob [aut, cre, cph], Sebastian Krantz [ctb] Maintainer: Morgan Jacob Description: Basic functions, implemented in C, for large data manipulation. Fast vectorised ifelse()/nested if()/switch() functions, psum()/pprod() functions equivalent to pmin()/pmax() plus others which are missing from base R. Most of these functions are callable at C level. +URL: https://fastverse.github.io/kit/, https://github.com/fastverse/kit License: GPL-3 Depends: R (>= 3.1.0) Encoding: UTF-8 -BugReports: https://github.com/2005m/kit/issues +BugReports: https://github.com/fastverse/kit/issues NeedsCompilation: yes ByteCompile: TRUE Repository: CRAN From 1f66900e08a421ab5cf3b044e0bf303b4b6656b6 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 21:03:04 -0500 Subject: [PATCH 06/10] Update README. --- README.md | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 83 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index df37561..e5cb6c1 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,90 @@ # kit -R Package: Basic functions implemented in C (and for some missing from base R) [![CRAN](https://www.r-pkg.org/badges/version-last-release/kit?color=blue)](https://cran.r-project.org/package=kit) [![CRAN](https://badges.cranchecks.info/flavor/release/kit.svg)](https://cran.r-project.org/web/checks/check_results_kit.html) -[![License: GPL v3](https://img.shields.io/github/license/2005m/kit)](https://www.gnu.org/licenses/gpl-3.0) -[![R-CMD-check](https://github.com/2005m/kit/workflows/R-CMD-check/badge.svg)](https://github.com/2005m/kit/actions) +[![License: GPL v3](https://img.shields.io/github/license/fastverse/kit)](https://www.gnu.org/licenses/gpl-3.0) +[![R-CMD-check](https://github.com/fastverse/kit/workflows/R-CMD-check/badge.svg)](https://github.com/fastverse/kit/actions) [![Coverage Status](https://codecov.io/gh/2005m/kit/graph/badge.svg)](https://codecov.io/github/2005m/kit?branch=master) [![downloads](https://cranlogs.r-pkg.org/badges/kit)](https://www.r-pkg.org/pkg/kit) [![kit status badge](https://fastverse.r-universe.dev/badges/kit)](https://fastverse.r-universe.dev) + +Fast data manipulation functions implemented in C for large datasets. Provides vectorized alternatives to base R functions with significant performance improvements. + +## Installation + +```r +# From CRAN +install.packages("kit") + +# Development version +install.packages("kit", repos = "https://fastverse.r-universe.dev") +``` + +## Features + +### Parallel Statistical Functions + +Vector-valued functions operating in parallel over vectors or data frames: + +- **`psum`, `pprod`, `pmean`**: Parallel sum, product, and mean (similar to `pmin`/`pmax`) +- **`pall`, `pany`**: Parallel all/any operations +- **`pcount`, `pcountNA`**: Count occurrences of values or NAs +- **`pfirst`, `plast`**: First/last non-missing values + +```r +x <- c(1, 3, NA, 5) +y <- c(2, NA, 4, 1) +psum(x, y, na.rm = TRUE) # [1] 3 3 4 6 +pmean(x, y, na.rm = TRUE) # [1] 1.5 3.0 4.0 3.0 +``` + +### Vectorized and Nested Switches + +Fast vectorized conditional logic: + +- **`iif`**: Fast replacement for `ifelse()` with attribute preservation +- **`nif`**: Nested if-else (SQL CASE WHEN equivalent) +- **`vswitch`, `nswitch`**: Vectorized switch statements + +```r +iif(x > 2, x, x - 1) # Preserves attributes unlike base::ifelse +nif(x == 1, "one", x == 2, "two", default = "other") +``` + +### Sorting + +- **`psort`**: Parallel sort for character vectors +- **`topn`**: Efficient partial sort (top N values) without full sorting + +```r +topn(x, n = 6L, decreasing = TRUE) # Much faster than order()[1:6] +``` + +### Factors + +- **`charToFact`**: Fast character-to-factor conversion +- **`setlevels`**: Change factor levels by reference + +### Unique Values and Counts + +- **`funique`, `fduplicated`**: Fast unique/duplicated operations +- **`uniqLen`**: Fast equivalent to `length(unique(x))` +- **`count`, `countNA`, `countOccur`**: Count element occurrences + +```r +funique(iris$Species) # Faster than base::unique +uniqLen(iris$Species) # Faster than length(unique()) +``` + +### Miscellaneous + +- **`fpos`**: Find matrix/vector positions within larger structures +- **`shareData`, `getData`, `clearData`**: Share data between R sessions + +## Documentation + +Full documentation available at: https://fastverse.github.io/kit/ + +## License + +GPL-3 From 3c5b2fc41ee7012ae04a16ae57c95bd5241a8634 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 21:08:28 -0500 Subject: [PATCH 07/10] Add introductory vignette. --- DESCRIPTION | 2 + _pkgdown.yml | 6 + vignettes/introduction.Rmd | 234 +++++++++++++++++++++++++++++++++++++ 3 files changed, 242 insertions(+) create mode 100644 vignettes/introduction.Rmd diff --git a/DESCRIPTION b/DESCRIPTION index 39e5107..fa9c548 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -11,6 +11,8 @@ Description: Basic functions, implemented in C, for large data manipulation. Fas URL: https://fastverse.github.io/kit/, https://github.com/fastverse/kit License: GPL-3 Depends: R (>= 3.1.0) +Suggests: knitr, rmarkdown +VignetteBuilder: knitr Encoding: UTF-8 BugReports: https://github.com/fastverse/kit/issues NeedsCompilation: yes diff --git a/_pkgdown.yml b/_pkgdown.yml index 78a313f..724f141 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -74,3 +74,9 @@ reference: - contents: - fpos - shareData/getData/clearData + +articles: +- title: "Introduction to kit" + desc: Introduces the package, including a walk-through of all main features. + contents: + - introduction diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd new file mode 100644 index 0000000..3c77e00 --- /dev/null +++ b/vignettes/introduction.Rmd @@ -0,0 +1,234 @@ +--- +title: "Introduction to kit" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Introduction to kit} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup} +library(kit) +``` + +## Overview + +**kit** provides a collection of fast utility functions implemented in C for data manipulation in R. These functions serve as high-performance alternatives to common base R operations, particularly beneficial when working with large datasets. + +The package focuses on three main areas: + +1. **Parallel statistical functions** - row-wise operations across vectors +2. **Vectorized conditionals** - fast if-else and switch operations +3. **Unique values and sorting** - efficient duplicate detection and partial sorting + +All functions are implemented in C with OpenMP support for multi-threading where applicable. + +## Parallel Statistical Functions + +When working with multiple vectors or columns of a data frame, you often need to compute row-wise statistics. Base R provides `pmin()` and `pmax()` for parallel minimum and maximum, but lacks equivalents for sum, mean, or product. kit fills this gap with a family of `p*` functions. + +### Row-wise Arithmetic + +```{r} +x <- c(1, 3, NA, 5) +y <- c(2, NA, 4, 1) +z <- c(3, 4, 4, 1) + +# Parallel sum across vectors +psum(x, y, z, na.rm = TRUE) + +# Parallel mean +pmean(x, y, z, na.rm = TRUE) + +# Parallel product +pprod(x, y, z, na.rm = TRUE) +``` + +These functions also accept a data frame or list directly: + +```{r} +df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(7, 8, 9)) +psum(df) +pmean(df) +``` + +### First and Last Non-Missing Values + +`pfirst()` and `plast()` return the first or last non-missing value across vectors - useful for coalescing multiple data sources: + +```{r} +primary <- c(NA, 2, NA, 4) +secondary <- c(1, NA, 3, NA) +fallback <- c(0, 0, 0, 0) + +# Take first available value +pfirst(primary, secondary, fallback) + +# Take last available value +plast(primary, secondary, fallback) +``` + +### Counting and Logical Operations + +Count occurrences of specific values or NAs across vectors: + +```{r} +a <- c(TRUE, FALSE, NA, TRUE) +b <- c(TRUE, NA, TRUE, FALSE) +c <- c(NA, TRUE, FALSE, TRUE) + +# Count NAs per row +pcountNA(a, b, c) + +# Count TRUE values +pcount(a, b, c, value = TRUE) + +# Any TRUE per row? +pany(a, b, c, na.rm = TRUE) + +# All TRUE per row? +pall(a, b, c, na.rm = TRUE) +``` + +## Vectorized Conditionals + +### Fast If-Else with Attribute Preservation + +Base R's `ifelse()` has a well-known limitation: it strips attributes from the result. This causes problems with dates, factors, and other classed objects. `iif()` preserves attributes from the `yes` argument: + +```{r} +dates <- as.Date(c("2024-01-01", "2024-01-02", "2024-01-03")) + +# Base ifelse strips the Date class +class(ifelse(dates > "2024-01-01", dates, dates - 1)) + +# iif preserves it +class(iif(dates > "2024-01-01", dates, dates - 1)) +iif(dates > "2024-01-01", dates, dates - 1) +``` + +`iif()` is also faster and provides an optional `na` argument for explicit NA handling: + +```{r} +x <- c(-2, -1, NA, 1, 2) +iif(x > 0, "positive", "non-positive", na = "missing") +``` + +### Nested Conditionals + +For multiple conditions, `nif()` provides a clean syntax similar to SQL's `CASE WHEN`: +```{r} +score <- c(95, 82, 67, 45, 78) + +nif( + score >= 90, "A", + score >= 80, "B", + score >= 70, "C", + score >= 60, "D", + default = "F" +) +``` + +### Vectorized Switch + +When mapping values to outputs, `vswitch()` is more efficient than nested `iif()` calls: + +```{r} +status_code <- c(1L, 2L, 3L, 1L, 4L) + +vswitch( + x = status_code, + values = c(1L, 2L, 3L), + outputs = c("pending", "approved", "rejected"), + default = "unknown" +) +``` + +For inline syntax, `nswitch()` pairs values and outputs directly: + +```{r} +nswitch(status_code, + 1L, "pending", + 2L, "approved", + 3L, "rejected", + default = "unknown" +) +``` + +## Fast Unique and Duplicates + +### Finding Unique Values + +`funique()` and `fduplicated()` are faster alternatives to base R's `unique()` and `duplicated()`, especially for data frames: + +```{r} +# Unique values +funique(c("a", "b", "a", "c", "b")) + +# Which are duplicates? +fduplicated(c("a", "b", "a", "c", "b")) + +# Count unique values directly +uniqLen(c("a", "b", "a", "c", "b")) +``` + +For data frames, these functions operate on rows: + +```{r} +df <- data.frame( + x = c(1, 1, 2, 2), + y = c("a", "a", "b", "b") +) +funique(df) +``` + +### Counting Occurrences + +`countOccur()` returns a frequency table as a data frame: + +```{r} +countOccur(c("apple", "banana", "apple", "cherry", "banana", "apple")) +``` + +## Partial Sorting with topn + +When you only need the top N values from a vector, sorting the entire vector is wasteful. `topn()` uses a partial sorting algorithm that is much faster for small N: + +```{r} +set.seed(42) +x <- rnorm(1000) + +# Get indices of top 5 values +topn(x, n = 5) + +# Get the actual values +topn(x, n = 5, index = FALSE) + +# Bottom 5 (smallest) +topn(x, n = 5, decreasing = FALSE, index = FALSE) +``` + +## Summary + +kit provides fast, focused utilities for common data manipulation tasks: + +| Task | kit function | Base R equivalent | +|------|--------------|-------------------| +| Row-wise sum | `psum()` | `rowSums(cbind(...))` | +| Row-wise mean | `pmean()` | `rowMeans(cbind(...))` | +| First non-NA | `pfirst()` | `apply(..., 1, function(x) x[!is.na(x)][1])` | +| Fast if-else | `iif()` | `ifelse()` | +| Nested conditions | `nif()` | nested `ifelse()` | +| Value mapping | `vswitch()` | `match()` + indexing | +| Unique values | `funique()` | `unique()` | +| Top N indices | `topn()` | `order()[1:n]` | + +For benchmarks and detailed documentation, see the function help pages and the [package website](https://fastverse.github.io/kit/). + From cde0ad3e6af0e78a6182a3d0b345604afa977b32 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 21:14:56 -0500 Subject: [PATCH 08/10] Minor improvements by Gemini 3 Pro. --- vignettes/introduction.Rmd | 148 ++++++++++++++++++++----------------- 1 file changed, 80 insertions(+), 68 deletions(-) diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd index 3c77e00..67101fd 100644 --- a/vignettes/introduction.Rmd +++ b/vignettes/introduction.Rmd @@ -20,48 +20,48 @@ library(kit) ## Overview -**kit** provides a collection of fast utility functions implemented in C for data manipulation in R. These functions serve as high-performance alternatives to common base R operations, particularly beneficial when working with large datasets. +**kit** provides a collection of fast utility functions implemented in C for data manipulation in R. It serves as a lightweight, high-performance toolkit for tasks that are either slow or cumbersome in base R, such as row-wise operations, vectorized conditionals, and duplicate detection. -The package focuses on three main areas: +Key features include: -1. **Parallel statistical functions** - row-wise operations across vectors -2. **Vectorized conditionals** - fast if-else and switch operations -3. **Unique values and sorting** - efficient duplicate detection and partial sorting +* **Parallel statistical functions**: Row-wise operations (`psum`, `pmean`, `pfirst`) using OpenMP. +* **Vectorized conditionals**: Fast `if-else` logic (`iif`, `nif`, `vswitch`) that preserves attributes. +* **Efficient set operations**: Faster `unique`, `duplicated`, and `count` for vectors and data frames. +* **Partial sorting**: Retrieve top N elements without sorting the entire vector (`topn`). +* **Factor utilities**: Fast character-to-factor conversion (`charToFact`) and level manipulation (`setlevels`). -All functions are implemented in C with OpenMP support for multi-threading where applicable. +Most functions are implemented in C and support multi-threading where applicable, making them significantly faster than their base R equivalents on large datasets. ## Parallel Statistical Functions -When working with multiple vectors or columns of a data frame, you often need to compute row-wise statistics. Base R provides `pmin()` and `pmax()` for parallel minimum and maximum, but lacks equivalents for sum, mean, or product. kit fills this gap with a family of `p*` functions. +Computing row-wise statistics across multiple vectors or data frame columns is a common task. While base R has `pmin()` and `pmax()`, it lacks efficient equivalents for sum, mean, or product. **kit** fills this gap. ### Row-wise Arithmetic +`psum()`, `pmean()`, and `pprod()` compute parallel sum, mean, and product respectively. They accept multiple vectors or a single list/data frame. + ```{r} x <- c(1, 3, NA, 5) y <- c(2, NA, 4, 1) z <- c(3, 4, 4, 1) -# Parallel sum across vectors +# Parallel sum psum(x, y, z, na.rm = TRUE) # Parallel mean pmean(x, y, z, na.rm = TRUE) - -# Parallel product -pprod(x, y, z, na.rm = TRUE) ``` -These functions also accept a data frame or list directly: +They are particularly useful for data frames: ```{r} df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(7, 8, 9)) psum(df) -pmean(df) ``` -### First and Last Non-Missing Values +### Coalescing Values -`pfirst()` and `plast()` return the first or last non-missing value across vectors - useful for coalescing multiple data sources: +`pfirst()` and `plast()` return the first or last non-missing value across a set of vectors. This is equivalent to the SQL `COALESCE` function (for `pfirst`). ```{r} primary <- c(NA, 2, NA, 4) @@ -70,60 +70,54 @@ fallback <- c(0, 0, 0, 0) # Take first available value pfirst(primary, secondary, fallback) - -# Take last available value -plast(primary, secondary, fallback) ``` -### Counting and Logical Operations +### Logical and Count Operations -Count occurrences of specific values or NAs across vectors: +You can check for conditions or count values row-wise with `pall`, `pany`, and `pcount`. ```{r} a <- c(TRUE, FALSE, NA, TRUE) b <- c(TRUE, NA, TRUE, FALSE) c <- c(NA, TRUE, FALSE, TRUE) +# Any TRUE per row? +pany(a, b, c, na.rm = TRUE) + # Count NAs per row pcountNA(a, b, c) -# Count TRUE values +# Count specific value (e.g., TRUE) per row pcount(a, b, c, value = TRUE) - -# Any TRUE per row? -pany(a, b, c, na.rm = TRUE) - -# All TRUE per row? -pall(a, b, c, na.rm = TRUE) ``` ## Vectorized Conditionals -### Fast If-Else with Attribute Preservation +### Fast If-Else (`iif`) -Base R's `ifelse()` has a well-known limitation: it strips attributes from the result. This causes problems with dates, factors, and other classed objects. `iif()` preserves attributes from the `yes` argument: +Base R's `ifelse()` is known to be slow and often strips attributes (like `Date` class or factor levels). `iif()` is a faster, more robust alternative that preserves attributes from the `yes` argument. ```{r} dates <- as.Date(c("2024-01-01", "2024-01-02", "2024-01-03")) -# Base ifelse strips the Date class +# Base ifelse strips class class(ifelse(dates > "2024-01-01", dates, dates - 1)) -# iif preserves it +# iif preserves class class(iif(dates > "2024-01-01", dates, dates - 1)) -iif(dates > "2024-01-01", dates, dates - 1) ``` -`iif()` is also faster and provides an optional `na` argument for explicit NA handling: +It also supports explicit `NA` handling: ```{r} x <- c(-2, -1, NA, 1, 2) iif(x > 0, "positive", "non-positive", na = "missing") ``` -### Nested Conditionals +### Nested Conditionals (`nif`) + +For multiple conditions, `nif()` offers a cleaner, more efficient syntax than nested `ifelse()` calls, similar to SQL's `CASE WHEN`. -For multiple conditions, `nif()` provides a clean syntax similar to SQL's `CASE WHEN`: ```{r} score <- c(95, 82, 67, 45, 78) @@ -136,9 +130,9 @@ nif( ) ``` -### Vectorized Switch +### Vectorized Switch (`vswitch`, `nswitch`) -When mapping values to outputs, `vswitch()` is more efficient than nested `iif()` calls: +`vswitch()` maps input values to outputs efficiently. `nswitch()` is a variation that uses a pairwise syntax. ```{r} status_code <- c(1L, 2L, 3L, 1L, 4L) @@ -164,42 +158,44 @@ nswitch(status_code, ## Fast Unique and Duplicates -### Finding Unique Values +**kit** provides optimized versions of `unique()` and `duplicated()` that are significantly faster for vectors and data frames. -`funique()` and `fduplicated()` are faster alternatives to base R's `unique()` and `duplicated()`, especially for data frames: +### Unique Values and Duplicates ```{r} -# Unique values -funique(c("a", "b", "a", "c", "b")) +vec <- c("a", "b", "a", "c", "b") -# Which are duplicates? -fduplicated(c("a", "b", "a", "c", "b")) +# Get unique values +funique(vec) -# Count unique values directly -uniqLen(c("a", "b", "a", "c", "b")) +# Check for duplicates +fduplicated(vec) ``` -For data frames, these functions operate on rows: +`uniqLen()` efficiently counts the number of unique elements without allocating the unique vector itself: ```{r} df <- data.frame( x = c(1, 1, 2, 2), y = c("a", "a", "b", "b") ) +uniqLen(df) funique(df) ``` ### Counting Occurrences -`countOccur()` returns a frequency table as a data frame: +`countOccur()` produces a frequency table (similar to `table()` or `dplyr::count()`) but returns a standard data frame. ```{r} -countOccur(c("apple", "banana", "apple", "cherry", "banana", "apple")) +countOccur(c("apple", "banana", "apple", "cherry")) ``` -## Partial Sorting with topn +## Sorting and Utilities -When you only need the top N values from a vector, sorting the entire vector is wasteful. `topn()` uses a partial sorting algorithm that is much faster for small N: +### Partial Sorting (`topn`) + +Sorting a large vector just to get the top few elements is inefficient. `topn()` uses a partial sorting algorithm to retrieve the top (or bottom) $N$ indices or values. ```{r} set.seed(42) @@ -208,27 +204,43 @@ x <- rnorm(1000) # Get indices of top 5 values topn(x, n = 5) -# Get the actual values -topn(x, n = 5, index = FALSE) - -# Bottom 5 (smallest) +# Get the actual values (decreasing = FALSE for bottom values) topn(x, n = 5, decreasing = FALSE, index = FALSE) ``` -## Summary +### Factor Manipulation -kit provides fast, focused utilities for common data manipulation tasks: +`charToFact()` is a fast alternative to `as.factor()` for character vectors, with control over `NA` levels. -| Task | kit function | Base R equivalent | -|------|--------------|-------------------| -| Row-wise sum | `psum()` | `rowSums(cbind(...))` | -| Row-wise mean | `pmean()` | `rowMeans(cbind(...))` | -| First non-NA | `pfirst()` | `apply(..., 1, function(x) x[!is.na(x)][1])` | -| Fast if-else | `iif()` | `ifelse()` | -| Nested conditions | `nif()` | nested `ifelse()` | -| Value mapping | `vswitch()` | `match()` + indexing | -| Unique values | `funique()` | `unique()` | -| Top N indices | `topn()` | `order()[1:n]` | - -For benchmarks and detailed documentation, see the function help pages and the [package website](https://fastverse.github.io/kit/). +```{r} +charToFact(c("a", "b", NA, "a")) +``` + +`setlevels()` allows you to change factor levels by reference (in-place), avoiding object copying. +### Finding Positions (`fpos`) + +`fpos()` finds the positions of a pattern (needle) within a vector (haystack). It can be used to find occurrences of one vector inside another. + +```{r} +haystack <- c(1, 2, 3, 4, 1, 2, 5) +needle <- c(1, 2) + +fpos(needle, haystack) +``` + +## Summary + +| Task | kit function | Base R equivalent | +|:---|:---|:---| +| **Row-wise sum** | `psum()` | `rowSums(cbind(...))` | +| **Row-wise mean** | `pmean()` | `rowMeans(cbind(...))` | +| **First non-NA** | `pfirst()` | `apply(..., 1, function(x) x[!is.na(x)][1])` | +| **Fast if-else** | `iif()` | `ifelse()` | +| **Nested if-else** | `nif()` | Nested `ifelse()` | +| **Switch** | `vswitch()` | `match()` + indexing | +| **Unique values** | `funique()` | `unique()` | +| **Top N indices** | `topn()` | `order()[1:n]` | +| **Char to Factor** | `charToFact()` | `as.factor()` | + +For comprehensive details and performance benchmarks, please refer to the individual function documentation. From 4269ec56edb3c750017ffd24cf5a838d23ade629 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 21:20:14 -0500 Subject: [PATCH 09/10] Minor improvements. --- vignettes/introduction.Rmd | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd index 67101fd..32778a6 100644 --- a/vignettes/introduction.Rmd +++ b/vignettes/introduction.Rmd @@ -132,7 +132,7 @@ nif( ### Vectorized Switch (`vswitch`, `nswitch`) -`vswitch()` maps input values to outputs efficiently. `nswitch()` is a variation that uses a pairwise syntax. +`vswitch()` maps input values to outputs efficiently. ```{r} status_code <- c(1L, 2L, 3L, 1L, 4L) @@ -145,7 +145,7 @@ vswitch( ) ``` -For inline syntax, `nswitch()` pairs values and outputs directly: +For pairwise syntax, `nswitch()` pairs values and outputs directly. ```{r} nswitch(status_code, @@ -156,6 +156,22 @@ nswitch(status_code, ) ``` +It can also replace with values from other vectors (columns), mixing scalars and vectors: + +```{r} +df <- data.frame( + code = c(1, 2, 1, 3, 2), + val_a = c(10, 20, 30, 40, 50), + val_b = c(100, 200, 300, 400, 500) +) +with(df, nswitch(code, + 1, val_a, + 2, val_b, + 3, 0, + default = NA_real_ +)) +``` + ## Fast Unique and Duplicates **kit** provides optimized versions of `unique()` and `duplicated()` that are significantly faster for vectors and data frames. From c7e6efba689e5321981e70916a6657c91f8b6028 Mon Sep 17 00:00:00 2001 From: SebKrantz Date: Tue, 2 Dec 2025 21:32:42 -0500 Subject: [PATCH 10/10] Fixing indentation. --- _pkgdown.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index 724f141..6ac23f1 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -77,6 +77,6 @@ reference: articles: - title: "Introduction to kit" - desc: Introduces the package, including a walk-through of all main features. - contents: - - introduction + desc: Introduces the package, including a walk-through of all main features. + contents: + - introduction