chore: Make list.rs non generic & simplify the code #1118

SemyonSinchenko · 2024-11-24T14:15:55Z

Which issue does this PR close?

Start of the discussion: comment in the PR1073

Rationale for this change

Spark supports only i32 indixed arrays but Comet attempts to support both ListArray (i32 indexed) and LargeListArray (i64 indexed). As a result the code in list.rs contains additional complexity that is not reachable and is not tested anyhow.

What changes are included in this PR?

Refactoring of the list.rs, ListExtract, GetArrayStructField and ArrayInsert now accept only ListArray instead of GenericListArray

How are these changes tested?

All existing tests.

SemyonSinchenko · 2024-11-24T17:04:14Z

@andygrove As you mentioned in #1073, it would be interesting to remove all logic related to LargeList and check if there is any regression. I did just that and all tests passed.

SemyonSinchenko · 2024-11-28T18:05:37Z

There is a valid argument against it:

The difference I think is that a LargeList can store more than Integer.MAX_VALUE entries in all rows in a single batch, so if you have multiple Spark rows all with the max num of rows supported, it wouldn't fit into an Arrow List array. That would probably need to be supported elsewhere, but it may be worth keeping the LargeList handling around in case that scenario is supported? And other DataFusion expressions might return a LargeList even if it doesn't come directly from Spark? Does the native Parquet reader ever use a LargeList?

Make list.rs non generic & simplify the code

faac90b

SemyonSinchenko marked this pull request as ready for review November 24, 2024 17:02

SemyonSinchenko mentioned this pull request Nov 28, 2024

feat: support array_insert #1073

Merged

SemyonSinchenko closed this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Make list.rs non generic & simplify the code #1118

chore: Make list.rs non generic & simplify the code #1118

SemyonSinchenko commented Nov 24, 2024 •

edited

Loading

SemyonSinchenko commented Nov 24, 2024

SemyonSinchenko commented Nov 28, 2024

chore: Make list.rs non generic & simplify the code #1118

chore: Make list.rs non generic & simplify the code #1118

Conversation

SemyonSinchenko commented Nov 24, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

SemyonSinchenko commented Nov 24, 2024

SemyonSinchenko commented Nov 28, 2024

SemyonSinchenko commented Nov 24, 2024 •

edited

Loading