Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FixedSizeBinary instead of Binary for int96 conversion when convertInt96ToArrowTimestamp is false #3088

Open
doki23 opened this issue Nov 29, 2024 · 2 comments

Comments

@doki23
Copy link

doki23 commented Nov 29, 2024

Describe the enhancement requested

public TypeMapping convertINT96(PrimitiveTypeName primitiveTypeName) throws RuntimeException {
  if (convertInt96ToArrowTimestamp) {
    return field(new ArrowType.Timestamp(TimeUnit.NANOSECOND, null));
  } else {
    return field(new ArrowType.Binary());
  }
}

When converting a Parquet type to an Arrow type, if the original type is int96 and the option convertInt96ToArrowTimestamp is set to false, the resulting Arrow type defaults to Binary. However, it might be more appropriate to use FixedSizeBinary instead.

Component(s)

No response

@Fokko
Copy link
Contributor

Fokko commented Dec 2, 2024

I agree that it would be more appropriate to return a FixedSizeBinary, let's ask @wgtmac, since he's the expert on Arrow. Just to make sure that there is no historical reason to return Binary instead.

Ps. Keep in mind that INT96 is deprecated.

@wgtmac
Copy link
Member

wgtmac commented Dec 2, 2024

Thanks for pinging me @Fokko!

I agree that FixedSizeBinary is more appropriate than Binary. However, I would argue that it is invalid to use INT96 for non-timestamp type. So I think it is better to ignore convertInt96ToArrowTimestamp and directly return Timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants