-
Notifications
You must be signed in to change notification settings - Fork 93
handle struct-based UDT literals in core #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
173ec59
f0e13ef
f5b6341
8c54706
36c81f0
680ed29
6ac7aab
b2063d0
ce7985f
e8cb862
77f0bd7
4b4f5ee
3732cf9
a216f3e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -693,21 +693,94 @@ public <R, C extends VisitationContext, E extends Throwable> R accept( | |
| } | ||
| } | ||
|
|
||
| /** | ||
| * Base interface for user-defined literals. | ||
| * | ||
| * <p>User-defined literals can be encoded in one of two ways as per the Substrait spec: | ||
| * | ||
| * <ul> | ||
| * <li>As {@code google.protobuf.Any} - see {@link UserDefinedAnyLiteral} | ||
| * <li>As {@code Literal.Struct} - see {@link UserDefinedStructLiteral} | ||
| * </ul> | ||
| */ | ||
| interface UserDefinedLiteral extends Literal { | ||
| String urn(); | ||
|
|
||
| String name(); | ||
|
|
||
| List<io.substrait.type.Type.Parameter> typeParameters(); | ||
| } | ||
|
|
||
| /** | ||
| * User-defined literal with value encoded as {@link com.google.protobuf.Any}. | ||
| * | ||
| * <p>This encoding allows for arbitrary binary data to be stored in the literal value. | ||
| */ | ||
| @Value.Immutable | ||
| abstract class UserDefinedLiteral implements Literal { | ||
| public abstract ByteString value(); | ||
| abstract class UserDefinedAnyLiteral implements UserDefinedLiteral { | ||
| @Override | ||
| public abstract String urn(); | ||
|
|
||
| @Override | ||
| public abstract String name(); | ||
|
|
||
| @Override | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract com.google.protobuf.Any value(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Release Notes |
||
|
|
||
| @Override | ||
| public Type.UserDefined getType() { | ||
| return Type.UserDefined.builder() | ||
| .nullable(nullable()) | ||
| .urn(urn()) | ||
| .name(name()) | ||
| .typeParameters(typeParameters()) | ||
| .build(); | ||
| } | ||
|
|
||
| public static ImmutableExpression.UserDefinedAnyLiteral.Builder builder() { | ||
| return ImmutableExpression.UserDefinedAnyLiteral.builder(); | ||
| } | ||
|
|
||
| @Override | ||
| public <R, C extends VisitationContext, E extends Throwable> R accept( | ||
| ExpressionVisitor<R, C, E> visitor, C context) throws E { | ||
| return visitor.visit(this, context); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * User-defined literal with value encoded as {@link | ||
| * io.substrait.proto.Expression.Literal.Struct}. | ||
| * | ||
| * <p>This encoding uses a structured list of fields to represent the literal value. | ||
| */ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docstring feels a little inconsistent. You have
but in the class the values are encoded as a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that this could be made less confusing. The intention was to mean the proto
?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I altered it to instead link to the actual proto. Let me know what you think! I also did the same for the any case. |
||
| @Value.Immutable | ||
| abstract class UserDefinedStructLiteral implements UserDefinedLiteral { | ||
| @Override | ||
| public abstract String urn(); | ||
|
|
||
| @Override | ||
| public abstract String name(); | ||
|
|
||
| @Override | ||
| public Type getType() { | ||
| return Type.withNullability(nullable()).userDefined(urn(), name()); | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract List<Literal> fields(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason to use a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sharing this comment because I believe it is relevant. Basically it is the same issue, which is that the POJO class called message Literal { // Both StructLiteral and UserDefinedStructLiteral are representing this _whole_ message
oneof literal_type {
...
Struct struct = 25;
...
UserDefined user_defined = 33;
}
...
message Struct {
// A possibly heterogeneously typed list of literals
repeated Literal fields = 1;
}
message UserDefined {
oneof type_anchor_type {
// points to a type_anchor defined in this plan
uint32 type_reference = 1;
// points to a type_alias_anchor defined in this plan.
uint32 type_alias_reference = 5;
}
// The parameters to be bound to the type class, if the type class is
// parameterizable.
repeated Type.Parameter type_parameters = 3;
// a user-defined literal can be encoded in one of two ways
oneof val {
// the value of the literal, serialized using some type-specific protobuf message
google.protobuf.Any value = 2;
// the value of the literal, serialized using the structure definition in its declaration
Literal.Struct struct = 4;
}
}
}Back to your comment, switching this member variable to be This doesn't mean that we couldn't replace the member variable as you suggest, but if we did do that, we would be carrying around extra meaningless data, which I think is more confusing ultimately. |
||
|
|
||
| @Override | ||
| public Type.UserDefined getType() { | ||
| return Type.UserDefined.builder() | ||
| .nullable(nullable()) | ||
| .urn(urn()) | ||
| .name(name()) | ||
| .typeParameters(typeParameters()) | ||
| .build(); | ||
| } | ||
|
|
||
| public static ImmutableExpression.UserDefinedLiteral.Builder builder() { | ||
| return ImmutableExpression.UserDefinedLiteral.builder(); | ||
| public static ImmutableExpression.UserDefinedStructLiteral.Builder builder() { | ||
| return ImmutableExpression.UserDefinedStructLiteral.builder(); | ||
| } | ||
|
|
||
| @Override | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,5 @@ | ||
| package io.substrait.expression.proto; | ||
|
|
||
| import com.google.protobuf.Any; | ||
| import com.google.protobuf.InvalidProtocolBufferException; | ||
| import io.substrait.expression.ExpressionVisitor; | ||
| import io.substrait.expression.FieldReference; | ||
| import io.substrait.expression.FunctionArg; | ||
|
|
@@ -377,21 +375,51 @@ public Expression visit( | |
|
|
||
| @Override | ||
| public Expression visit( | ||
| io.substrait.expression.Expression.UserDefinedLiteral expr, EmptyVisitationContext context) { | ||
| io.substrait.expression.Expression.UserDefinedAnyLiteral expr, | ||
| EmptyVisitationContext context) { | ||
| int typeReference = | ||
| extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name())); | ||
| return lit( | ||
| bldr -> { | ||
| try { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This exception doesn't happen anymore because we don't parse the |
||
| bldr.setNullable(expr.nullable()) | ||
| .setUserDefined( | ||
| Expression.Literal.UserDefined.newBuilder() | ||
| .setTypeReference(typeReference) | ||
| .setValue(Any.parseFrom(expr.value()))) | ||
| .build(); | ||
| } catch (InvalidProtocolBufferException e) { | ||
| throw new IllegalStateException(e); | ||
| } | ||
| Expression.Literal.UserDefined.Builder userDefinedBuilder = | ||
| Expression.Literal.UserDefined.newBuilder() | ||
| .setTypeReference(typeReference) | ||
| .addAllTypeParameters( | ||
| expr.typeParameters().stream() | ||
| .map(typeProtoConverter::toProto) | ||
| .collect(java.util.stream.Collectors.toList())) | ||
| .setValue(expr.value()); | ||
|
|
||
| bldr.setNullable(expr.nullable()).setUserDefined(userDefinedBuilder).build(); | ||
| }); | ||
| } | ||
|
|
||
| @Override | ||
| public Expression visit( | ||
| io.substrait.expression.Expression.UserDefinedStructLiteral expr, | ||
| EmptyVisitationContext context) { | ||
| int typeReference = | ||
| extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name())); | ||
| return lit( | ||
| bldr -> { | ||
| Expression.Literal.Struct structLiteral = | ||
| Expression.Literal.Struct.newBuilder() | ||
| .addAllFields( | ||
| expr.fields().stream() | ||
| .map(this::toLiteral) | ||
| .collect(java.util.stream.Collectors.toList())) | ||
| .build(); | ||
|
|
||
| Expression.Literal.UserDefined.Builder userDefinedBuilder = | ||
| Expression.Literal.UserDefined.newBuilder() | ||
| .setTypeReference(typeReference) | ||
| .addAllTypeParameters( | ||
| expr.typeParameters().stream() | ||
| .map(typeProtoConverter::toProto) | ||
| .collect(java.util.stream.Collectors.toList())) | ||
| .setStruct(structLiteral); | ||
|
|
||
| bldr.setNullable(expr.nullable()).setUserDefined(userDefinedBuilder).build(); | ||
| }); | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -393,6 +393,19 @@ abstract class UserDefined implements Type { | |
|
|
||
| public abstract String name(); | ||
|
|
||
| /** | ||
| * Returns the type parameters for this user-defined type. | ||
| * | ||
| * <p>Type parameters are used to represent parameterized/generic types, such as {@code | ||
| * vector<i32>}. | ||
| * | ||
| * @return a list of type parameters, or an empty list if this type is not parameterized | ||
| */ | ||
| @Value.Default | ||
| public java.util.List<Parameter> typeParameters() { | ||
| return java.util.Collections.emptyList(); | ||
| } | ||
|
|
||
| public static ImmutableType.UserDefined.Builder builder() { | ||
| return ImmutableType.UserDefined.builder(); | ||
| } | ||
|
|
@@ -402,4 +415,50 @@ public <R, E extends Throwable> R accept(TypeVisitor<R, E> typeVisitor) throws E | |
| return typeVisitor.visit(this); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Represents a type parameter for user-defined types. | ||
| * | ||
| * <p>Type parameters can be data types (like {@code i32} in {@code List<i32>}), or value | ||
| * parameters (like the {@code 10} in {@code VARCHAR<10>}). This interface provides a type-safe | ||
| * representation of all possible parameter kinds. | ||
| */ | ||
| interface Parameter {} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the function of https://github.com/substrait-io/substrait-java/blob/main/core/src/main/java/io/substrait/function/ParameterizedType.java applicable here? It feels vaguely related.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤔 yes that is an interesting point. Looking into it!
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So as I understand it, So for example, |
||
|
|
||
| /** A data type parameter, such as the {@code i32} in {@code List<i32>}. */ | ||
| @Value.Immutable | ||
| abstract class ParameterDataType implements Parameter { | ||
| public abstract Type type(); | ||
| } | ||
|
|
||
| /** A boolean value parameter. */ | ||
| @Value.Immutable | ||
| abstract class ParameterBooleanValue implements Parameter { | ||
| public abstract boolean value(); | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a usecase for having boolean literal type parameters? I can't imagine a usecase where something like
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤷 just wanted to be consistent with the spec |
||
|
|
||
| /** An integer value parameter, such as the {@code 10} in {@code VARCHAR<10>}. */ | ||
| @Value.Immutable | ||
| abstract class ParameterIntegerValue implements Parameter { | ||
| public abstract long value(); | ||
| } | ||
|
|
||
| /** An enum value parameter (represented as a string). */ | ||
| @Value.Immutable | ||
| abstract class ParameterEnumValue implements Parameter { | ||
| public abstract String value(); | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would this type be used for?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed offline, these were inspired by this portion of the simple extension schema: type_param_defs: # an array of compound type parameter definitions
type: array
items:
...
properties:
...
type: # expected metatype for the parameter
type: string
enum:
- dataType
- boolean
- integer
- enumeration
- stringSo while I don't understand the usage of it, I thought it was best to include all of them for consistency with the spec. |
||
|
|
||
| /** A string value parameter. */ | ||
| @Value.Immutable | ||
| abstract class ParameterStringValue implements Parameter { | ||
| public abstract String value(); | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would this type be used for?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| /** An explicitly null/unspecified parameter, used to select the default value (if any). */ | ||
| class ParameterNull implements Parameter { | ||
| public static final ParameterNull INSTANCE = new ParameterNull(); | ||
|
|
||
| private ParameterNull() {} | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release Notes
We should call out that we don't construct UserDefinedLiterals anymore.