Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different semantics of casting from int64 to timestamp between Comet and Spark #146

Open
viirya opened this issue Feb 29, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@viirya
Copy link
Member

viirya commented Feb 29, 2024

What is the problem the feature request solves?

In Spark, casting from LongType to Timestamp is to casting seconds to microseconds. arrow-rs's cast kernel basically simply reinterprets Int64Array as TimestampMicrosecondArray. So the scale is different, i.e., seconds v.s. microseconds.

We need to add this special case into Comet's cast kernel.

Describe the potential solution

No response

Additional context

No response

@viirya viirya added the enhancement New feature or request label Feb 29, 2024
@andygrove andygrove added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 6, 2024
@andygrove andygrove added this to the 0.2.0 milestone Jul 25, 2024
@andygrove
Copy link
Member

This cast is currently disabled by default, but it would be good to implement this. I have added this to the 0.2.0 milestone.

@andygrove andygrove removed this from the 0.2.0 milestone Aug 16, 2024
@justahuman1
Copy link

I can take this

@andygrove
Copy link
Member

Thanks @justahuman1!

A good place to start is org.apache.comet.expressions.CometCast#canCastFromLong. This currently returns false for casting to timestamp, so enabling it here is the first step.

Next, you can enable the following test in CometCastSuite:

  ignore("cast LongType to TimestampType") {
    // java.lang.ArithmeticException: long overflow
    castTest(generateLongs(), DataTypes.TimestampType)
  }

@parthchandra
Copy link
Contributor

Also check CometExpressionSuite.test("cast timestamp and timestamp_ntz"). This reads timestamps as longs from a parquet file (which may store the values as either millis or micros).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants