Skip to content

feat: introduce Scalar and StructLike interfaces for Transform evaluation #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 278 additions & 0 deletions src/iceberg/struct_like.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

#pragma once

#include <string>
#include <vector>

#include "iceberg/iceberg_export.h"
#include "iceberg/result.h"
#include "iceberg/type.h"

namespace iceberg {

class Scalar;
class StructScalar;
class ArrayScalar;
class ScalarVisitor;
class StructLike;
class StructLikeVisitor;
/**
* @brief Base class representing a generic scalar value.
*/
class ICEBERG_EXPORT Scalar {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make std::shared_ptr<Type> as a member variable to Scalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three possible approaches to organizing type information:

  1. Scalar holds built-in types independently, and StructLike does not rely on StructType.
  2. Scalar does not contain any type information, and its type is inferred from the associated StructType within StructLike.
  3. Both Scalar and StructType carry type information, resulting in some redundancy.
    I have chosen the first approach. Which one would you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Scalar does not carry the type info, we have to pass Schema wherever Scalar is used. In the case of schema evolution, then we need both Scalar schema and expected (table) schema.

public:
virtual ~Scalar() = default;

/**
* @brief Checks if the scalar represents a null value.
* @return true if the value is null, false otherwise.
*/
virtual bool IsNull() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the implication that this is not a non-trivial operation? (Else it might be is_null to imply it's a simple getter?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Scalar class, is_null can be a member variable, or the IsNull interface can be implemented by subclasses, where subclasses might use optional or pointer to determine if it is null. Here, the second option was chosen; I actually don't have a strong preference. StructScalar also uses the is_null field, so it might be more appropriate to directly make is_null a member variable of Scalar. I'll make a modification.


/**
* @brief Casts the scalar to a specific derived type.
* @tparam T The target derived Scalar type.
* @return Reference to the casted value.
* @throws std::bad_cast if the cast fails.
*/
template <typename T>
const T& as() const {
auto ptr = dynamic_cast<const T*>(this);
if (!ptr) throw std::bad_cast();
return *ptr;
}

/**
* @brief Creates a deep copy of the scalar.
* @return A shared pointer to the cloned scalar.
*/
virtual std::shared_ptr<Scalar> clone() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Clone? (Ditto in general for the PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will modify it.


/**
* @brief Accepts a visitor for dispatching based on scalar type.
* @param visitor The visitor to accept.
*/
virtual void accept(ScalarVisitor& visitor) const = 0;
};

/**
* @brief Visitor interface for Scalar, supporting type-dispatched operations.
*/
class ICEBERG_EXPORT ScalarVisitor {
public:
virtual ~ScalarVisitor() = default;

/// Visit methods for primitive types
virtual void visit(int32_t value) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it accept Int32Scalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
The PrimitiveScalar::Accept method directly uses value, and using PrimitiveScalar is also acceptable. I will make some modifications.

Copy link
Member

@wgtmac wgtmac Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it accepts physical values instead of typed values, how can it differentiate between the actual types? For example, long and timestamp both use int64_t.

virtual void visit(int64_t value) = 0;
virtual void visit(float value) = 0;
virtual void visit(double value) = 0;
virtual void visit(bool value) = 0;
virtual void visit(const std::string& value) = 0;

/**
* @brief Called when visiting a null scalar.
*/
virtual void visitNull() = 0;
Comment on lines +91 to +94
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a specific NullScalar type, or any scalar with a null value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see the behavior below. IMO I'd rather we pass the scalars to the visitor so that the users can tell between different types of nulls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, visit I will modify it to accept Scalar types as input


/// Visit methods for complex types
virtual void visitStruct(const StructScalar& value) = 0;
virtual void visitArray(const ArrayScalar& value) = 0;
};

/**
* @brief Represents a scalar that holds a primitive value.
* @tparam T The type of the primitive value.
*/
template <typename T>
class ICEBERG_EXPORT PrimitiveScalar : public Scalar {
public:
/**
* @brief Constructs a PrimitiveScalar from an optional value.
* @param value The optional primitive value.
*/
explicit PrimitiveScalar(std::optional<T> value) : value_(std::move(value)) {}

bool IsNull() const override { return !value_.has_value(); }

/**
* @brief Gets the actual value held by the scalar.
* @return Reference to the value.
* @note Behavior is undefined if the scalar is null.
*/
const T& value() const { return *value_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but perhaps in DEBUG it should use optional::value() to force an exception and in NDEBUG it can use operator* to bypass that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I will modify here


/**
* @brief (Deprecated) Type-specific accessor for raw value.
* @tparam U Must match T.
* @return Reference to the value if types match.
* @throws std::bad_cast if U does not match T.
*/
template <typename U>
const U& as() const {
if constexpr (std::is_same_v<T, U>) return *value_;
throw std::bad_cast();
}

std::shared_ptr<Scalar> clone() const override {
return std::make_shared<PrimitiveScalar<T>>(value_);
}

void accept(ScalarVisitor& visitor) const override {
if (IsNull()) {
visitor.visitNull();
} else {
visitor.visit(*value_);
}
}

private:
std::optional<T> value_;
};

// Aliases for common primitive scalar types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are supposed to specialize each type like DecimalType, BinaryType, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be specialized.

using Int32Scalar = PrimitiveScalar<int32_t>;
using Int64Scalar = PrimitiveScalar<int64_t>;
using DoubleScalar = PrimitiveScalar<double>;
using StringScalar = PrimitiveScalar<std::string>;
using BoolScalar = PrimitiveScalar<bool>;

/**
* @brief Base class for scalars that contain a vector of child Scalars.
* @tparam Derived The concrete subclass type (CRTP).
*/
template <typename Derived>
class VectorScalarBase : public Scalar {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit weird that values in it have different meanings in the StructScalar (horizontal) and ArrayScalar (vertical).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit strange because the type information is all in StructLike, with StructScalar/ArrayScalar both being a group of Scalar members and there is not much difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this sounds like an anti-pattern in the OOP. You cannot use the base VectorScalarBase class without knowing its subclass implementation.

public:
/**
* @brief Constructs a VectorScalarBase.
* @param values Child scalar values.
* @param is_null Whether the entire scalar is null.
*/
explicit VectorScalarBase(std::vector<std::shared_ptr<Scalar>> values,
bool is_null = false)
: values_(std::move(values)), is_null_(is_null) {}

/**
* @brief Gets the child scalar values.
* @return Vector of shared pointers to child Scalars.
*/
const std::vector<std::shared_ptr<Scalar>>& value() const { return values_; }

bool IsNull() const override { return is_null_; }

std::shared_ptr<Scalar> clone() const override {
std::vector<std::shared_ptr<Scalar>> cloned;
for (const auto& v : values_) {
cloned.push_back(v->clone());
}
return std::make_shared<Derived>(std::move(cloned), is_null_);
}

protected:
std::vector<std::shared_ptr<Scalar>> values_;
bool is_null_;
};

/**
* @brief Represents a struct-like scalar (tuple of fields).
*/
class StructScalar : public VectorScalarBase<StructScalar> {
public:
using VectorScalarBase::VectorScalarBase;

void accept(ScalarVisitor& visitor) const override {
if (IsNull()) {
visitor.visitNull();
} else {
visitor.visitStruct(*this);
}
}
};

/**
* @brief Represents an array-like scalar (list of values).
*/
class ArrayScalar : public VectorScalarBase<ArrayScalar> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we define MapScalar? Should it be array<struct<key,value>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I haven't thought of a good design, perhaps map can still be used, and Scalar needs to add interfaces for hash and equality judgment?

public:
using VectorScalarBase::VectorScalarBase;

void accept(ScalarVisitor& visitor) const override {
if (IsNull()) {
visitor.visitNull();
} else {
visitor.visitArray(*this);
}
}
};

/**
* @brief Abstract interface for structured field access.
*/
class ICEBERG_EXPORT StructLike {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply reuse StructScalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructLike has a StructType, while StructScalar does not have one, I think this should depend on the discussion result regarding whether a Scalar should have its own type information.

public:
virtual ~StructLike() = default;

/**
* @brief Gets the type of the structure.
* @return Reference to the StructType.
*/
virtual const StructType& struct_type() const = 0;

/**
* @brief Gets the number of fields.
* @return Number of fields.
*/
virtual int32_t size() const = 0;

/**
* @brief Gets the value of a field by position.
* @param pos The zero-based index of the field.
* @return Reference to the scalar value.
*/
virtual const Scalar& get(int32_t pos) const = 0;

/**
* @brief Sets the value of a field by position.
* @param pos The zero-based index of the field.
* @param value The scalar to assign.
* @return Status indicating success or failure.
*/
virtual Status set(int32_t pos, std::shared_ptr<Scalar> value) = 0;
};

/**
* @brief Visitor interface for traversing StructLike fields.
*/
class ICEBERG_EXPORT StructLikeVisitor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we can remove this as well.

public:
virtual ~StructLikeVisitor() = default;

/**
* @brief Visits a single field of a StructLike object.
* @param pos The index of the field.
* @param type The type of the field.
* @param value The scalar value of the field.
*/
virtual void visitField(int pos, const Type& type, const Scalar& value) = 0;
};

} // namespace iceberg
Loading