Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add map and filter for inline strings #5

Open
bkamins opened this issue Jul 29, 2021 · 5 comments
Open

add map and filter for inline strings #5

bkamins opened this issue Jul 29, 2021 · 5 comments

Comments

@bkamins
Copy link

bkamins commented Jul 29, 2021

The problem is that current generic implementations for AbstractString fall back to String, so e.g. we have:

julia> x = InlineString("abc")
"abc"

julia> typeof(x)
InlineString3

julia> uppercase(x)
"ABC"

julia> typeof(uppercase(x))
String

julia> filter(==('a'), x)
"a"

julia> typeof(filter(==('a'), x))
String
@quinnj
Copy link
Member

quinnj commented Jul 30, 2021

Hmmm, it seems like this is much more than map/filter. There's also lpad/rpad, replace, repeat, reverse, and probably others. I guess we need to probably audit various string operations and see what kind of effort we want to put in to keep inlinestrings as inlinestrings and not rely on Base methods. I certainly think it's worthwhile, just feels a bit intimidating until we know the complete list of methods we'd need to overload.

@bkamins
Copy link
Author

bkamins commented Jul 31, 2021

I hit map/filter because of uppercase/lowercase actually :).

Here is a list I could quickly make:

  • startswith
  • endswith
  • chop
  • chomp
  • lstrip
  • rstrip
  • strip
  • lpad
  • rpad
  • split
  • rsplit
  • hex2bytes
  • hex2bytes!
  • Vector{UInt8}
  • Array{UInt8}
  • maybe Symbol (to be checked if it can be improved)
  • to be decided if * and ^ (as they can create longer strings)
  • cmp
  • hash (maybe we already have it fast - I have not checked)
  • reverse

Some of the items in the list are for performance other for getting a correct type of the result.

repeat and replace should not be on the list I think (as it can create a longer string that does not fit into an inline string)

@quinnj quinnj transferred this issue from JuliaData/WeakRefStrings.jl Oct 1, 2021
@rsturley
Copy link

chop needs some particular work since the base method doesn't work:

julia> using InlineStrings

julia> p=String7("Bü\n")
"Bü\n"

julia> chop(p)
"B\xc3"

@nickrobinson251
Copy link
Collaborator

Reviewing the big long list of string methods a little further:

Some functions that can grow the string (so maybe should not be on the list)

  • repeat
  • replace
  • lpad
  • rpad
  • joinpath

already implemented, or for which we have PRs...
(...for performance reasons)

  • hash
  • cmp
  • reverse
  • startswith
  • endswith

(...for returning InlineStrings rather than SubStrings or Strings)

  • chop
  • chomp
  • chopprefix
  • chopsuffix

still to do...
(...for performance reasons, e.g. to avoid calling String(::InlineString))

  • Vector{UInt8} (can we do better than Base.Vector{UInt8}(s::InlineString) = Vector{UInt8}(codeunits(s))?)
  • Array{UInt8} (as above)
  • hex2bytes (I think this is basically just Base.hex2bytes!(dest::AbstractArray{UInt8}, s::InlineString) = hex2bytes!(dest, codeunits(s))
  • maybe the various SHA.sha* functions? (i think just SHA.sha1(s::InlineString) = sha1(codeunits(s)) etc)
  • (potentially parse/tryparse but i'm guessing that's more complexity than it's worth?)

(...for returning InlineStrings rather than SubStrings or Strings)

  • split
  • maybe the various Filesytem.split* functions?
  • getindex(str, range)
  • getindex(str, vec)

And probably i'm missing plenty other candidates

@bkamins
Copy link
Author

bkamins commented Oct 3, 2022

Functions that grow the string can use AbstractString default feedback indeed, but maybe it is possible to provide more efficient implementations (still returning String most likely).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants