-
Notifications
You must be signed in to change notification settings - Fork 921
JSON Reader Faster Coercion of Primitives to String #7273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can run the json reader benchmark here: cargo bench --bench json_reader and compare before/after performance Since this is well specified I think this would be a good first issue for anyone who wants to learn about low level rust optimization |
BTW it may turn out that the suggested change doesn't improve performance -- but we can measure with the benchmark |
I believe this codepath is only hit when coercing integer to strings, e.g.
Reading into a StringArray column, anything encoded as a string will be unaffected. E.g.
So any benchmark would need to be doing this in order to see improvement It is also worth noting that parsing a primitive only to cast it back to a string is inherently wasteful, but avoiding this would be extremely difficult as the tape decoder is unaware of the schema |
Hi, I realized I can send a PR for that and I implemented and sent it: #7274 |
closing the PR. |
I found the issue.
And, the following is executed during reading.
I do not see a reason to optimize the previous code block ( But, with some quick tests, I saw that we may take the advantage of new approach during serialization. |
I am not sending a PR because the performance did not improve. Here is what I did:
The results are not good.
That also made me think that if other similar changes are really statistically significant. |
Thanks for trying @ndemir |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While reviewing #7263 from @zhuqi-lucas I noticed that the json reader is allocating a new for each string field it parses
arrow-rs/arrow-json/src/reader/string_array.rs
Lines 110 to 120 in f4fde76
Describe the solution you'd like
I would like to make the json reader faster by not allocating in the inner loop
Describe alternatives you've considered
I think instead of doing
A typically faster pattern is
Additional context
The text was updated successfully, but these errors were encountered: