Skip to content

Conversation

@jonovik
Copy link
Contributor

@jonovik jonovik commented Sep 4, 2025

#575 points out that some stringr functions preserve names, but most don't. This pull request implements preservation of names where there should be no ambiguity.

#575 was closed with a recommendation to take the issue to stringi. I tried, and my pull request gagolews/stringi#521 was rejected as a matter of policy. Hence, I propose that we make this change in stringr. There are situations where preserving names is useful, and str_subset() and str_trunc() currently do that. In the stringi issue gagolews/stringi#59, @hadley argued that names is the only attribute that stringi should preserve.

This pull request makes focused modifications to the functions involved, and adds tests/testthat/test-preserve-names.R. All existing and new tests pass, and devtools::check() reports no errors or warnings, only a note which was already in current main.

Principles:

  • Preserve names where the output has a 1-to-1 correspondence with the input.
  • Preserve names for functions where base behavior suggests keeping names (e.g., grepl/grep(value=TRUE), sort/unique).
  • Source of names: use names from the primary string argument only; ignore names on pattern/replacement/others and never merge.
  • For 1-row-per-input matrices/lists (e.g., str_locate/str_match/str_split_fixed), set row/list names from input names.
  • Don't preserve names where strings are combined, or the return values are indices
    (e.g., str_c, str_flatten, str_glue, str_which, str_order, str_equal).

These functions already preserve names:

  • str_subset
  • str_trunc

These functions preserve names if this pull request is accepted:

  • str_count
  • str_detect
  • str_starts
  • str_ends
  • str_like
  • str_escape
  • str_replace
  • str_remove
  • str_conv
  • str_trim
  • str_pad
  • str_sub
  • str_to_lower
  • str_to_upper
  • str_to_title
  • str_extract
  • str_locate
  • str_match
  • str_extract_all
  • str_locate_all
  • str_match_all
  • str_split
  • str_split_fixed
  • str_split_i
  • str_sub_all
  • str_length
  • str_width
  • str_dup
  • word
  • str_unique
  • str_replace_na
  • str_sort
  • str_wrap
  • str_replace_all

Principles:

- Preserve names where the output has a 1-to-1 correspondence with the input.
- Preserve names for functions where base behavior suggests keeping names (e.g., grepl/grep(value=TRUE), sort/unique).
- Source of names: use names from the primary `string` argument only; ignore names on `pattern`/`replacement`/others and never merge.
- For 1-row-per-input matrices/lists (e.g., str_locate/str_match/str_split_fixed), set row/list names from input names.
- Don't preserve names where strings are combined, or the return values are indices
  (e.g., str_c, str_flatten, str_glue, str_which, str_order, str_equal).

Currently passing:

- str_subset
- str_trunc

Currently failing:

- str_count
- str_detect
- str_starts
- str_ends
- str_like
- str_escape
- str_replace
- str_remove
- str_conv
- str_trim
- str_pad
- str_sub
- str_to_lower
- str_to_upper
- str_to_title
- str_extract
- str_locate
- str_match
- str_extract_all
- str_locate_all
- str_match_all
- str_split
- str_split_fixed
- str_split_i
- str_sub_all
- str_length
- str_width
- str_dup
- word
- str_unique
- str_replace_na
- str_sort
- str_wrap
- str_replace_all
Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persisting with this!

The "missings never match" example from ?str_subset broke in commit d681e45, which added name preservation.
The current commit keeps names while not matching NAs in string.
Also added a test for missings never to match.

> # stringr 1.5.2
> stringr::str_subset(c("a", NA, "b"), ".")
[1] "a" "b"

> # pak::pkg_install("tidyverse/stringr@d681e45")
> stringr::str_subset(c("a", NA, "b"), ".")
[1] "a" NA  "b"
> stringr::str_subset(c(First = "a", Second = NA, Third = "b"), ".")
First  <NA> Third
  "a"    NA   "b"

> # Current version
> stringr::str_subset(c(First = "a", Second = NA, Third = "b"), ".")
First Third
  "a"   "b"
This enables many functions to finish with
if (keep_names(string, pattern)) copy_names(string, out) else out
@hadley hadley merged commit c91419f into tidyverse:main Sep 22, 2025
13 checks passed
@hadley
Copy link
Member

hadley commented Sep 22, 2025

@jonovik thanks for all your work on this!

@hadley
Copy link
Member

hadley commented Sep 23, 2025

@jonovik unsurprisingly this causes a revdep failures since outputs now preserve names. Would you be interested in helping me prepare PRs to fix those failing packages?

@jonovik
Copy link
Contributor Author

jonovik commented Oct 5, 2025

@jonovik unsurprisingly this causes a revdep failures since outputs now preserve names. Would you be interested in helping me prepare PRs to fix those failing packages?

Apologies for the slow response.

I hadn't heard about the revdepcheck package before, but now I've read up a bit. Running revdep_check() locally on my laptop is estimated to take 1 day for 2712 packages 8-)

I'm happy to take a look, just point me to a package or two that needs fixing.

@hadley
Copy link
Member

hadley commented Oct 5, 2025

@jonovik no problems — I listed all the problems in #590, and then I also fixed them all 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants