Skip to content

Support complex datatypes in Comet Scan #434

Closed
@mattwparas

Description

@mattwparas

What is the problem the feature request solves?

As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.

It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.

Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.

Describe the potential solution

Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.

Additional context

To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions