You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.
It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.
Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.
Describe the potential solution
Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.
Additional context
To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.
The text was updated successfully, but these errors were encountered:
Actually Comet columnar shuffle already supports some complex data types. You can find some tests using complex types in Comet shuffle test suites.
But Comet scan operator doesn't support complex types now. So you cannot read data of complex types from Parquet and do native operations on it. I think currently we also don't add any native expression which can produce output of complex types.
viirya
changed the title
Support complex datatypes
Support complex datatypes in Comet Scan
May 15, 2024
@mattwparas We are now actively working on supporting reading complex types from Parquet. We have this partially working in main behind feature flags, and the epic to track this work is #1043
What is the problem the feature request solves?
As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.
It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.
Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.
Describe the potential solution
Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.
Additional context
To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.
The text was updated successfully, but these errors were encountered: