-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP: User defined sorting #15106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: User defined sorting #15106
Conversation
…rting is not supported.
I'm not sure I'm qualified to review this from an implementation standpoint, but the idea of an extension type registry and a centralized place (here, the LogicalType trait implementation) for behaviour that differs is something I'd like as well, and this PR nicely points out all the places I should be looking 🙂 . |
Happy to hear any kind of feedback on that @paleolimbot . So take a look if you've time. Also good to hear that others also have these requirements in their projects :). Could you think of a way to implement your use case by extending the proposed solution (to get a feeling whether this idea generalizes to other use cases)? As this is quite a fundamental change that can influence many more features to come, we would definitely need some input from someone more familiar with the project. |
Related to: #14828, #12644, #14247 and all tests most probably won't pass (the new test mentioned below passes on my machine though).
This PR is a rough proposal for an implementation of user-defined sorting. The main goal is to get a discussion starting whether this is a direction we want to go.
Some design considerations that help understand the PR:
PhysicalSortExpr
allows defining a custom sort order, but does not know anything about extension types.LogicalTypePlanningInformation
allows parameterizing this creation processField
whether there is an entry in the metadata with the keyEXTENSION_TYPE_NAME_KEY
.ReturnTypeInfo
to return aField
rather thanDataType
#14247 for some discussions on this issue.DFSchema
towards including aLogicalType
and aDataType
(at least as a first step). The remaining approach still works with such an assumption.This is similar to "Option 1: User defined operators" from the discussion in #14247. Here are some thoughts on that:
SortExpr
just on a special type.Expr
to UDFs via analyzers can be also tricky as we would then, for example, need to parameterizeSortExpr
with a custom ordering to allow the user-defined sorting use case.There are many changes in tests etc. so here is a list of "highlights" you should check out when taking a look at this PR.
Highlights:
datafusion/common/src/sort.rs
: new sort data structures and adapted procedures from arrow-rs.datafusion/common/src/types/logical.rs
: Extension to logical type to provide aLogicalTypePlanningInformation
.datafusion/core/tests/dataframe/test_types.rs
: A logical type (IntOrFloatType
) that implements a custom order.datafusion/core/tests/dataframe/mod.rs
: A test for sorting a on union type.datafusion/core/src/physical_planner.rs
: Applying custom order during physical plan creation.To get (a similar) approach upstream I'd suggest the following steps (splitting up the PR, adding tests etc.):
SortOrdering
toPhysicalSortExpr
and add support for user-defined sorting operations (DynComparator
for now)ExtensionTypeRegistry
to support registering extension types.SortExpr
during planning and construct the correctPhysicalSortExpr
.AggregateExec
). This step will be done interatively.Note that this procedure leads to a state where some parts of DF already make use of user defined sorting, while other do not yet support it. However, I don't think this is a deal breaker.
I'm eager to hear your thoughts!
Further Notes
I think this would also require users to return a
Field
from a UDF instead of just a data type (#14247) to allow UDFs to return extension types. See TODO note indatafusion/core/src/physical_planner.rs
.I think supported the row-based sorting from arrow-rs can also be supported in a similar fashion by extending
CustomOrdering
.