-
Notifications
You must be signed in to change notification settings - Fork 195
Preserve names where sensible #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Principles: - Preserve names where the output has a 1-to-1 correspondence with the input. - Preserve names for functions where base behavior suggests keeping names (e.g., grepl/grep(value=TRUE), sort/unique). - Source of names: use names from the primary `string` argument only; ignore names on `pattern`/`replacement`/others and never merge. - For 1-row-per-input matrices/lists (e.g., str_locate/str_match/str_split_fixed), set row/list names from input names. - Don't preserve names where strings are combined, or the return values are indices (e.g., str_c, str_flatten, str_glue, str_which, str_order, str_equal). Currently passing: - str_subset - str_trunc Currently failing: - str_count - str_detect - str_starts - str_ends - str_like - str_escape - str_replace - str_remove - str_conv - str_trim - str_pad - str_sub - str_to_lower - str_to_upper - str_to_title - str_extract - str_locate - str_match - str_extract_all - str_locate_all - str_match_all - str_split - str_split_fixed - str_split_i - str_sub_all - str_length - str_width - str_dup - word - str_unique - str_replace_na - str_sort - str_wrap - str_replace_all
…readable. I have kept the `out <- ...` temporary assignment where there is additional logic involved (if/switch/vapply...), or if we need a line break anyway.
hadley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for persisting with this!
The "missings never match" example from ?str_subset broke in commit d681e45, which added name preservation. The current commit keeps names while not matching NAs in string. Also added a test for missings never to match. > # stringr 1.5.2 > stringr::str_subset(c("a", NA, "b"), ".") [1] "a" "b" > # pak::pkg_install("tidyverse/stringr@d681e45") > stringr::str_subset(c("a", NA, "b"), ".") [1] "a" NA "b" > stringr::str_subset(c(First = "a", Second = NA, Third = "b"), ".") First <NA> Third "a" NA "b" > # Current version > stringr::str_subset(c(First = "a", Second = NA, Third = "b"), ".") First Third "a" "b"
This enables many functions to finish with if (keep_names(string, pattern)) copy_names(string, out) else out
…ubset(). Remove tests for special-casing NA. Remove a few redundant if's. Comment on copy_names() and keep_names() in utils.R.
|
@jonovik thanks for all your work on this! |
|
@jonovik unsurprisingly this causes a revdep failures since outputs now preserve names. Would you be interested in helping me prepare PRs to fix those failing packages? |
Apologies for the slow response. I hadn't heard about the I'm happy to take a look, just point me to a package or two that needs fixing. |
#575 points out that some stringr functions preserve names, but most don't. This pull request implements preservation of names where there should be no ambiguity.
#575 was closed with a recommendation to take the issue to stringi. I tried, and my pull request gagolews/stringi#521 was rejected as a matter of policy. Hence, I propose that we make this change in stringr. There are situations where preserving names is useful, and str_subset() and str_trunc() currently do that. In the stringi issue gagolews/stringi#59, @hadley argued that names is the only attribute that stringi should preserve.
This pull request makes focused modifications to the functions involved, and adds
tests/testthat/test-preserve-names.R. All existing and new tests pass, and devtools::check() reports no errors or warnings, only a note which was already in currentmain.Principles:
stringargument only; ignore names onpattern/replacement/others and never merge.(e.g., str_c, str_flatten, str_glue, str_which, str_order, str_equal).
These functions already preserve names:
These functions preserve names if this pull request is accepted: