-
Notifications
You must be signed in to change notification settings - Fork 22
feat: introduce Scalar and StructLike interfaces for Transform evaluation #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
virtual ~ScalarVisitor() = default; | ||
|
||
/// Visit methods for primitive types | ||
virtual void visit(int32_t value) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it accept Int32Scalar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it accepts physical values instead of typed values, how can it differentiate between the actual types? For example, long and timestamp both use int64_t
.
std::optional<T> value_; | ||
}; | ||
|
||
// Aliases for common primitive scalar types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are supposed to specialize each type like DecimalType
, BinaryType
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be specialized.
/** | ||
* @brief Base class representing a generic scalar value. | ||
*/ | ||
class ICEBERG_EXPORT Scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make std::shared_ptr<Type>
as a member variable to Scalar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three possible approaches to organizing type information:
- Scalar holds built-in types independently, and StructLike does not rely on StructType.
- Scalar does not contain any type information, and its type is inferred from the associated StructType within StructLike.
- Both Scalar and StructType carry type information, resulting in some redundancy.
I have chosen the first approach. Which one would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Scalar does not carry the type info, we have to pass Schema wherever Scalar is used. In the case of schema evolution, then we need both Scalar schema and expected (table) schema.
* @tparam Derived The concrete subclass type (CRTP). | ||
*/ | ||
template <typename Derived> | ||
class VectorScalarBase : public Scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit weird that values in it have different meanings in the StructScalar
(horizontal) and ArrayScalar
(vertical).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit strange because the type information is all in StructLike, with StructScalar/ArrayScalar both being a group of Scalar members and there is not much difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this sounds like an anti-pattern in the OOP. You cannot use the base VectorScalarBase
class without knowing its subclass implementation.
/** | ||
* @brief Represents an array-like scalar (list of values). | ||
*/ | ||
class ArrayScalar : public VectorScalarBase<ArrayScalar> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we define MapScalar
? Should it be array<struct<key,value>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I haven't thought of a good design, perhaps map can still be used, and Scalar needs to add interfaces for hash and equality judgment?
/** | ||
* @brief Abstract interface for structured field access. | ||
*/ | ||
class ICEBERG_EXPORT StructLike { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we simply reuse StructScalar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructLike has a StructType, while StructScalar does not have one, I think this should depend on the discussion result regarding whether a Scalar should have its own type information.
/** | ||
* @brief Visitor interface for traversing StructLike fields. | ||
*/ | ||
class ICEBERG_EXPORT StructLikeVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, we can remove this as well.
* @brief Checks if the scalar represents a null value. | ||
* @return true if the value is null, false otherwise. | ||
*/ | ||
virtual bool IsNull() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the implication that this is not a non-trivial operation? (Else it might be is_null
to imply it's a simple getter?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Scalar class, is_null
can be a member variable, or the IsNull
interface can be implemented by subclasses, where subclasses might use optional
or pointer
to determine if it is null. Here, the second option was chosen; I actually don't have a strong preference. StructScalar
also uses the is_null
field, so it might be more appropriate to directly make is_null
a member variable of Scalar. I'll make a modification.
* @brief Creates a deep copy of the scalar. | ||
* @return A shared pointer to the cloned scalar. | ||
*/ | ||
virtual std::shared_ptr<Scalar> clone() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be Clone
? (Ditto in general for the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will modify it.
/** | ||
* @brief Called when visiting a null scalar. | ||
*/ | ||
virtual void visitNull() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a specific NullScalar type, or any scalar with a null value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see the behavior below. IMO I'd rather we pass the scalars to the visitor so that the users can tell between different types of nulls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, visit I will modify it to accept Scalar types as input
* @return Reference to the value. | ||
* @note Behavior is undefined if the scalar is null. | ||
*/ | ||
const T& value() const { return *value_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but perhaps in DEBUG it should use optional::value()
to force an exception and in NDEBUG it can use operator*
to bypass that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I will modify here
Problems to Solve
Proposed Solution
Benefits