-
Notifications
You must be signed in to change notification settings - Fork 978
Fix RowConverter when FixedSizeList is not the last #7789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RowConverter when FixedSizeList is not the last #7789
Conversation
Fix `RowConverter` row decoding when there is a `FixedSizeList` element and it's not the last.
🤖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @findepi
I plan to approve this PR pending a performance check
🤖: Benchmark completed Details
|
When columns are unequal sizes, then encoding them can panic. This can be observed by changing `test_fixed_size_list_with_variable_width_content`, eg adding one more element to the second list. What's unclear, the panic can be avoided by calling `tracker.materialized()` in `compute_lengths_fixed_size_list` (similar to how list encoding does it). However, we still won't encode the whole data passed in, so it's better to reject this upfront.
for colum in columns.iter().skip(1) { | ||
if colum.len() != columns[0].len() { | ||
return Err(ArrowError::InvalidArgumentError(format!( | ||
"RowConverter columns must all have the same length, expected {} got {}", | ||
columns[0].len(), | ||
colum.len() | ||
))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another check added. This guards against passing ill-formed data from the user.
Without this, it's possible to get a panic inside the RowEncoder.
🤖 |
I am not sure what happened to the benchmark script -- I will rerun shortly |
🤖 |
🤖: Benchmark completed Details
|
I will rerun the benchmarks on this PR again -- some of the results look like noise |
🤖 |
🤖: Benchmark completed Details
|
It looks like |
Possibly -- it could also be noise in the benchmark the next step woudl be to see if we can reproduce the slowdown locally (run |
The starting point of this PR, aka the merge base of this PR, is 01c5efc.
and then (still on the same commit)
Some benchmarks returned 19% improvement, some 19% or even 46% degradation. Definitely running on my laptop can hardly be called a well isolated environment, but i did my best in this regard. third run transcript
|
arrow-row/src/lib.rs
Outdated
unsafe { self.convert_raw(&mut rows, validate_utf8) } | ||
let result = unsafe { self.convert_raw(&mut rows, validate_utf8) }?; | ||
|
||
for (i, row) in rows.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (i, row) in rows.iter().enumerate() { | |
#[cfg!(test)] | |
for (i, row) in rows.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find if cfg!(test) {
easier to read. Would it be ok?
Out of concern this can cause runtime penalty.
Which issue does this PR close?
none
Rationale for this change
RowConverter
decoding fails when there is aFixedSizeList
element and it's not the last.What changes are included in this PR?
Fix
RowConverter
row decoding when there is aFixedSizeList
element and it's not the last.Are these changes tested?
yes