Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,16 @@ public O visit(Expression.StructLiteral expr, C context) throws E {
return visitFallback(expr, context);
}

@Override
public O visit(Expression.UserDefinedAnyLiteral expr, C context) throws E {
return visitFallback(expr, context);
}

@Override
public O visit(Expression.UserDefinedStructLiteral expr, C context) throws E {
return visitFallback(expr, context);
}

@Override
public O visit(Expression.NestedStruct expr, C context) throws E {
return visitFallback(expr, context);
Expand Down
85 changes: 79 additions & 6 deletions core/src/main/java/io/substrait/expression/Expression.java
Original file line number Diff line number Diff line change
Expand Up @@ -693,21 +693,94 @@ public <R, C extends VisitationContext, E extends Throwable> R accept(
}
}

/**
* Base interface for user-defined literals.
*
* <p>User-defined literals can be encoded in one of two ways as per the Substrait spec:
*
* <ul>
* <li>As {@code google.protobuf.Any} - see {@link UserDefinedAnyLiteral}
* <li>As {@code Literal.Struct} - see {@link UserDefinedStructLiteral}
* </ul>
*/
interface UserDefinedLiteral extends Literal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release Notes
We should call out that we don't construct UserDefinedLiterals anymore.

String urn();

String name();

List<io.substrait.type.Type.Parameter> typeParameters();
}

/**
* User-defined literal with value encoded as {@link com.google.protobuf.Any}.
*
* <p>This encoding allows for arbitrary binary data to be stored in the literal value.
*/
@Value.Immutable
abstract class UserDefinedLiteral implements Literal {
public abstract ByteString value();
abstract class UserDefinedAnyLiteral implements UserDefinedLiteral {
@Override
public abstract String urn();

@Override
public abstract String name();

@Override
public abstract List<io.substrait.type.Type.Parameter> typeParameters();

public abstract com.google.protobuf.Any value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release Notes
Capturing the value as an Any instead of a ByteString does feel nicer ✨


@Override
public Type.UserDefined getType() {
return Type.UserDefined.builder()
.nullable(nullable())
.urn(urn())
.name(name())
.typeParameters(typeParameters())
.build();
}

public static ImmutableExpression.UserDefinedAnyLiteral.Builder builder() {
return ImmutableExpression.UserDefinedAnyLiteral.builder();
}

@Override
public <R, C extends VisitationContext, E extends Throwable> R accept(
ExpressionVisitor<R, C, E> visitor, C context) throws E {
return visitor.visit(this, context);
}
}

/**
* User-defined literal with value encoded as {@link
* io.substrait.proto.Expression.Literal.Struct}.
*
* <p>This encoding uses a structured list of fields to represent the literal value.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring feels a little inconsistent. You have

literal with value encoded as {@code Literal.Struct}

but in the class the values are encoded as a List<Literal>, and the second part of the docstring is consistent with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this could be made less confusing. The intention was to mean the proto Literal.Struct, no some java class. What do you think about just altering the message to say

User-defined literal with value encoded via the proto message {@code Literal.Struct}.

?

Copy link
Member Author

@benbellick benbellick Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I altered it to instead link to the actual proto. Let me know what you think! I also did the same for the any case.

@Value.Immutable
abstract class UserDefinedStructLiteral implements UserDefinedLiteral {
@Override
public abstract String urn();

@Override
public abstract String name();

@Override
public Type getType() {
return Type.withNullability(nullable()).userDefined(urn(), name());
public abstract List<io.substrait.type.Type.Parameter> typeParameters();

public abstract List<Literal> fields();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a List<Literal> here instead of just a Expression.StructLiteral, which would map directly to the protobuf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing this comment because I believe it is relevant.


Basically it is the same issue, which is that the POJO class called StructLiteral is representing a special case of the Literal class.

  message Literal { // Both StructLiteral and UserDefinedStructLiteral are representing this _whole_ message
    oneof literal_type {
      ...
      Struct struct = 25;
      ...
      UserDefined user_defined = 33;
    }
    ...
    message Struct {
      // A possibly heterogeneously typed list of literals
      repeated Literal fields = 1;
    }

    message UserDefined {
      oneof type_anchor_type {
        // points to a type_anchor defined in this plan
        uint32 type_reference = 1;

        // points to a type_alias_anchor defined in this plan.
        uint32 type_alias_reference = 5;
      }

      // The parameters to be bound to the type class, if the type class is
      // parameterizable.
      repeated Type.Parameter type_parameters = 3;

      // a user-defined literal can be encoded in one of two ways
      oneof val {
        // the value of the literal, serialized using some type-specific protobuf message
        google.protobuf.Any value = 2;
        // the value of the literal, serialized using the structure definition in its declaration
        Literal.Struct struct = 4;
      }
    }
  }

Back to your comment, switching this member variable to be Expression.StructLiteral would amount to embedding one literal proto inside of another. For example, the proto Struct doesn't actually have a nullable field. But the POJO Expression.StructLiteral does have a nullable field because it inherits from Literal.

This doesn't mean that we couldn't replace the member variable as you suggest, but if we did do that, we would be carrying around extra meaningless data, which I think is more confusing ultimately.


@Override
public Type.UserDefined getType() {
return Type.UserDefined.builder()
.nullable(nullable())
.urn(urn())
.name(name())
.typeParameters(typeParameters())
.build();
}

public static ImmutableExpression.UserDefinedLiteral.Builder builder() {
return ImmutableExpression.UserDefinedLiteral.builder();
public static ImmutableExpression.UserDefinedStructLiteral.Builder builder() {
return ImmutableExpression.UserDefinedStructLiteral.builder();
}

@Override
Expand Down
46 changes: 42 additions & 4 deletions core/src/main/java/io/substrait/expression/ExpressionCreator.java
Original file line number Diff line number Diff line change
Expand Up @@ -295,13 +295,51 @@ public static Expression.NestedStruct nestedStruct(boolean nullable, Expression.
return Expression.NestedStruct.builder().nullable(nullable).addFields(fields).build();
}

public static Expression.UserDefinedLiteral userDefinedLiteral(
boolean nullable, String urn, String name, Any value) {
return Expression.UserDefinedLiteral.builder()
/**
* Create a UserDefinedAnyLiteral with google.protobuf.Any representation.
*
* @param nullable whether the literal is nullable
* @param urn the URN of the user-defined type
* @param name the name of the user-defined type
* @param typeParameters the type parameters for the user-defined type (can be empty list)
* @param value the value, encoded as google.protobuf.Any
*/
public static Expression.UserDefinedAnyLiteral userDefinedLiteralAny(
boolean nullable,
String urn,
String name,
java.util.List<io.substrait.type.Type.Parameter> typeParameters,
Any value) {
return Expression.UserDefinedAnyLiteral.builder()
.nullable(nullable)
.urn(urn)
.name(name)
.addAllTypeParameters(typeParameters)
.value(value)
.build();
}

/**
* Create a UserDefinedStructLiteral with Struct representation.
*
* @param nullable whether the literal is nullable
* @param urn the URN of the user-defined type
* @param name the name of the user-defined type
* @param typeParameters the type parameters for the user-defined type (can be empty list)
* @param fields the fields, as a list of Literal values
*/
public static Expression.UserDefinedStructLiteral userDefinedLiteralStruct(
boolean nullable,
String urn,
String name,
java.util.List<io.substrait.type.Type.Parameter> typeParameters,
java.util.List<Expression.Literal> fields) {
return Expression.UserDefinedStructLiteral.builder()
.nullable(nullable)
.urn(urn)
.name(name)
.value(value.toByteString())
.addAllTypeParameters(typeParameters)
.addAllFields(fields)
.build();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ public interface ExpressionVisitor<R, C extends VisitationContext, E extends Thr

R visit(Expression.NestedStruct expr, C context) throws E;

R visit(Expression.UserDefinedLiteral expr, C context) throws E;
R visit(Expression.UserDefinedAnyLiteral expr, C context) throws E;

R visit(Expression.UserDefinedStructLiteral expr, C context) throws E;

R visit(Expression.Switch expr, C context) throws E;

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
package io.substrait.expression.proto;

import com.google.protobuf.Any;
import com.google.protobuf.InvalidProtocolBufferException;
import io.substrait.expression.ExpressionVisitor;
import io.substrait.expression.FieldReference;
import io.substrait.expression.FunctionArg;
Expand Down Expand Up @@ -377,21 +375,51 @@ public Expression visit(

@Override
public Expression visit(
io.substrait.expression.Expression.UserDefinedLiteral expr, EmptyVisitationContext context) {
io.substrait.expression.Expression.UserDefinedAnyLiteral expr,
EmptyVisitationContext context) {
int typeReference =
extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name()));
return lit(
bldr -> {
try {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception doesn't happen anymore because we don't parse the Any here. Instead, we have a reference to the pre-parsed proto directly.

bldr.setNullable(expr.nullable())
.setUserDefined(
Expression.Literal.UserDefined.newBuilder()
.setTypeReference(typeReference)
.setValue(Any.parseFrom(expr.value())))
.build();
} catch (InvalidProtocolBufferException e) {
throw new IllegalStateException(e);
}
Expression.Literal.UserDefined.Builder userDefinedBuilder =
Expression.Literal.UserDefined.newBuilder()
.setTypeReference(typeReference)
.addAllTypeParameters(
expr.typeParameters().stream()
.map(typeProtoConverter::toProto)
.collect(java.util.stream.Collectors.toList()))
.setValue(expr.value());

bldr.setNullable(expr.nullable()).setUserDefined(userDefinedBuilder).build();
});
}

@Override
public Expression visit(
io.substrait.expression.Expression.UserDefinedStructLiteral expr,
EmptyVisitationContext context) {
int typeReference =
extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name()));
return lit(
bldr -> {
Expression.Literal.Struct structLiteral =
Expression.Literal.Struct.newBuilder()
.addAllFields(
expr.fields().stream()
.map(this::toLiteral)
.collect(java.util.stream.Collectors.toList()))
.build();

Expression.Literal.UserDefined.Builder userDefinedBuilder =
Expression.Literal.UserDefined.newBuilder()
.setTypeReference(typeReference)
.addAllTypeParameters(
expr.typeParameters().stream()
.map(typeProtoConverter::toProto)
.collect(java.util.stream.Collectors.toList()))
.setStruct(structLiteral);

bldr.setNullable(expr.nullable()).setUserDefined(userDefinedBuilder).build();
});
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -492,10 +492,36 @@ public Expression.Literal from(io.substrait.proto.Expression.Literal literal) {
{
io.substrait.proto.Expression.Literal.UserDefined userDefinedLiteral =
literal.getUserDefined();

SimpleExtension.Type type =
lookup.getType(userDefinedLiteral.getTypeReference(), extensions);
return ExpressionCreator.userDefinedLiteral(
literal.getNullable(), type.urn(), type.name(), userDefinedLiteral.getValue());
String urn = type.urn();
String name = type.name();
List<io.substrait.type.Type.Parameter> typeParameters =
userDefinedLiteral.getTypeParametersList().stream()
.map(protoTypeConverter::from)
.collect(Collectors.toList());

switch (userDefinedLiteral.getValCase()) {
case VALUE:
return ExpressionCreator.userDefinedLiteralAny(
literal.getNullable(), urn, name, typeParameters, userDefinedLiteral.getValue());
case STRUCT:
return ExpressionCreator.userDefinedLiteralStruct(
literal.getNullable(),
urn,
name,
typeParameters,
userDefinedLiteral.getStruct().getFieldsList().stream()
.map(this::from)
.collect(Collectors.toList()));
case VAL_NOT_SET:
throw new IllegalStateException(
"UserDefined literal has no value (neither 'value' nor 'struct' is set)");
default:
throw new IllegalStateException(
"Unknown UserDefined literal value case: " + userDefinedLiteral.getValCase());
}
}
default:
throw new IllegalStateException("Unexpected value: " + literal.getLiteralTypeCase());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ public class DefaultExtensionCatalog {
"extension:io.substrait:functions_rounding_decimal";
public static final String FUNCTIONS_SET = "extension:io.substrait:functions_set";
public static final String FUNCTIONS_STRING = "extension:io.substrait:functions_string";
public static final String EXTENSION_TYPES = "extension:io.substrait:extension_types";

public static final SimpleExtension.ExtensionCollection DEFAULT_COLLECTION =
loadDefaultCollection();
Expand All @@ -44,6 +45,8 @@ private static SimpleExtension.ExtensionCollection loadDefaultCollection() {
.map(c -> String.format("/functions_%s.yaml", c))
.collect(Collectors.toList());

defaultFiles.add("/extension_types.yaml");

return SimpleExtension.load(defaultFiles);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,13 @@ public Optional<Expression> visit(Expression.NestedStruct expr, EmptyVisitationC

@Override
public Optional<Expression> visit(
Expression.UserDefinedLiteral expr, EmptyVisitationContext context) throws E {
Expression.UserDefinedAnyLiteral expr, EmptyVisitationContext context) throws E {
return visitLiteral(expr);
}

@Override
public Optional<Expression> visit(
Expression.UserDefinedStructLiteral expr, EmptyVisitationContext context) throws E {
return visitLiteral(expr);
}

Expand Down
59 changes: 59 additions & 0 deletions core/src/main/java/io/substrait/type/Type.java
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,19 @@ abstract class UserDefined implements Type {

public abstract String name();

/**
* Returns the type parameters for this user-defined type.
*
* <p>Type parameters are used to represent parameterized/generic types, such as {@code
* vector<i32>}.
*
* @return a list of type parameters, or an empty list if this type is not parameterized
*/
@Value.Default
public java.util.List<Parameter> typeParameters() {
return java.util.Collections.emptyList();
}

public static ImmutableType.UserDefined.Builder builder() {
return ImmutableType.UserDefined.builder();
}
Expand All @@ -402,4 +415,50 @@ public <R, E extends Throwable> R accept(TypeVisitor<R, E> typeVisitor) throws E
return typeVisitor.visit(this);
}
}

/**
* Represents a type parameter for user-defined types.
*
* <p>Type parameters can be data types (like {@code i32} in {@code List<i32>}), or value
* parameters (like the {@code 10} in {@code VARCHAR<10>}). This interface provides a type-safe
* representation of all possible parameter kinds.
*/
interface Parameter {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yes that is an interesting point. Looking into it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I understand it, ParameterizedType.java is used for representing abstract types with parameters in yaml files. Where as the Parameter above being introduced is actually a concrete argument passed into the type.

So for example, List<any1> could be a ParameterizedType, whereas List<int32> is a type with parameters [int32].


/** A data type parameter, such as the {@code i32} in {@code List<i32>}. */
@Value.Immutable
abstract class ParameterDataType implements Parameter {
public abstract Type type();
}

/** A boolean value parameter. */
@Value.Immutable
abstract class ParameterBooleanValue implements Parameter {
public abstract boolean value();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a usecase for having boolean literal type parameters? I can't imagine a usecase where something like MySpecialType<false> is something you would need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#613 (comment)

🤷 just wanted to be consistent with the spec


/** An integer value parameter, such as the {@code 10} in {@code VARCHAR<10>}. */
@Value.Immutable
abstract class ParameterIntegerValue implements Parameter {
public abstract long value();
}

/** An enum value parameter (represented as a string). */
@Value.Immutable
abstract class ParameterEnumValue implements Parameter {
public abstract String value();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this type be used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, these were inspired by this portion of the simple extension schema:

  type_param_defs: # an array of compound type parameter definitions
    type: array
    items:
      ...
      properties:
        ...
        type: # expected metatype for the parameter
          type: string
          enum:
            - dataType
            - boolean
            - integer
            - enumeration
            - string

So while I don't understand the usage of it, I thought it was best to include all of them for consistency with the spec.


/** A string value parameter. */
@Value.Immutable
abstract class ParameterStringValue implements Parameter {
public abstract String value();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this type be used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/** An explicitly null/unspecified parameter, used to select the default value (if any). */
class ParameterNull implements Parameter {
public static final ParameterNull INSTANCE = new ParameterNull();

private ParameterNull() {}
}
}
Loading