Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date data types can cast to a time zone-specific timestamp #7141

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

friendlymatthew
Copy link

@friendlymatthew friendlymatthew commented Feb 16, 2025

Closes #7140. Related: apache/datafusion#14638

Currently, you can only cast Date32/Date64 into a Timestamp without a time zone. This commit updates the casting rules to support date to time-zone specific timestamps and remodels the casting logic to support this feature.

There are no user facing changes.

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/cast-date-to-timestamp-tz branch from 717ed28 to 02a86ac Compare February 16, 2025 17:29
@friendlymatthew
Copy link
Author

friendlymatthew commented Feb 16, 2025

This is ready for review. The test cases comprehensively cover the entire Date × TimeUnit matrix. Also tested via the datafusion-cli. See apache/datafusion#14638 (comment)

cc/ @xudong963 @Omega359

Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look over this looks nice

#[test]
fn test_cast_date32_to_timestamp_with_timezone() {
let tz = "+0545"; // UTC + 0545 is Asia/Kathmandu
let a = Date32Array::from(vec![Some(18628), Some(18993), None]); // 2021-1-1, 2022-1-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default tz of date is UTC, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default tz of date is UTC, right?

Yes, but shouldn't we ensure Dates cast to timestamps with a particular timezone?

I'm also a bit confused by this comment. Are you implying that a is improperly constructed? (i.e. the third element shouldn't be None?) Here's a similar test case that tests for time zone aware timestamps:

let array_with_tz =
TimestampMillisecondArray::from(vec![Some(864000003005), Some(1545696002001), None])
.with_timezone(tz.to_string());
let expected = vec![
Some("1997-05-19 05:45:03.005000"),
Some("2018-12-25 05:45:02.001000"),
None,
];
assert_cast_timestamp_to_string!(
array_with_tz,
DataType::Utf8View,
StringViewArray,
cast_options,
expected
);

Maybe I'm misunderstanding something. Would appreciate some clarification. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is okay for me. Thanks!

I'm just curious about which particular timezone we should use. Should we depend on a time zone configuration or use the UTC directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious about which particular timezone we should use. Should we depend on a time zone configuration or use the UTC directly?

Other test cases use random time zones like the +0545 mentioned above. I'd argue that for the sake of testing, we'd want to use a non-default time zone.

@xudong963 xudong963 requested a review from alamb February 18, 2025 05:03
@tustvold
Copy link
Contributor

tustvold commented Feb 19, 2025

Thank you for this PR, and especially for the tests which make it clear what this is doing.

I do have one question, is this the expected behaviour?

In particular when casting timestamps without a timezone to timestamps with a timezone, the conversion is done preserving the "local" time.

i.e.

2021-01-01 05:45:00.000000 -> 2021-01-01 05:45:00.000000 +0545

I therefore wonder if date conversions should behave similarly, i.e.

2021-01-01 -> 2021-01-01 00:00:00.000000 +0545

instead of what this PR implements which interprets the date as being in the UTC timezone

2021-01-01 -> 2021-01-01 05:45:00.000000 +0545

With the behaviour in this PR I think you would get different behaviour going Date32 -> Timestamp(_, None) -> Timestamp(_, Some(_))? (It might be worth testing this regardless)

@friendlymatthew
Copy link
Author

I do have one question, is this the expected behaviour?

In particular when casting timestamps without a timezone to timestamps with a timezone, the conversion is done preserving the "local" time.

i.e.

2021-01-01 05:45:00.000000 -> 2021-01-01 05:45:00.000000 +0545

I therefore wonder if date conversions should behave similarly, i.e.

2021-01-01 -> 2021-01-01 00:00:00.000000 +0545

Interesting! Yeah, I agree that we should preserve the local time when casting. c19dbc5 resolves this.

Now, when we cast a date to a time zone aware timestamp, we do a intermediary cast into a regular Timestamp, then attempt to recast with the time zone.

/cc @tustvold @Omega359

Comment on lines 1809 to 1815
(Date64, Timestamp(TimeUnit::Second, _)) => {
let array = array
.as_primitive::<Date64Type>()
.unary::<_, TimestampSecondType>(|x| x / MILLISECONDS)
.with_timezone_opt(to_tz.clone()),
)),
(Date64, Timestamp(TimeUnit::Millisecond, to_tz)) => Ok(Arc::new(
array
.unary::<_, TimestampSecondType>(|x| x / MILLISECONDS);

cast_with_options(&array, to_type, cast_options)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can split into two arms if we don't like how we're eagerly recasting a Timestamp.

(Date64, Timestamp(TimeUnit::Second, None)) => {
     // single cast here
},
(Date64, Timestamp(TimeUnit::Second, Some(_))) => {
    // first cast to Timestamp, then recast with time zone
}

Comment on lines +5239 to +5260
#[test]
fn test_cast_date32_to_timestamp_and_timestamp_with_timezone() {
let tz = "+0545"; // UTC + 0545 is Asia/Kathmandu
let a = Date32Array::from(vec![Some(18628), None, None]); // 2021-1-1, 2022-1-1
let array = Arc::new(a) as ArrayRef;

let b = cast(
&array,
&DataType::Timestamp(TimeUnit::Second, Some(tz.into())),
)
.unwrap();
let c = b.as_primitive::<TimestampSecondType>();
let string_array = cast(&c, &DataType::Utf8).unwrap();
let result = string_array.as_string::<i32>();
assert_eq!("2021-01-01T00:00:00+05:45", result.value(0));

let b = cast(&array, &DataType::Timestamp(TimeUnit::Second, None)).unwrap();
let c = b.as_primitive::<TimestampSecondType>();
let string_array = cast(&c, &DataType::Utf8).unwrap();
let result = string_array.as_string::<i32>();
assert_eq!("2021-01-01T00:00:00", result.value(0));
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof that casting a Date to a Timestamp with/without time zones result in the same string.

Not the biggest fan of this test case, would appreciate some comments if we would want to a test case that compares the cast results (with/without time zone).

@friendlymatthew
Copy link
Author

friendlymatthew commented Feb 21, 2025

Weird, clippy seems to complain about a completely different region of the codebase?

/cc @xudong963

@xudong963
Copy link
Member

Weird, clippy seems to complain about a completely different region of the codebase?

/cc @xudong963

FYI: #7167, maybe you can do an update.

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/cast-date-to-timestamp-tz branch from c37b1a4 to 7828b6d Compare February 21, 2025 03:49
@friendlymatthew
Copy link
Author

Weird, clippy seems to complain about a completely different region of the codebase?
/cc @xudong963

FYI: #7167, maybe you can do an update.

Should be gtg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support casting Date32/Date64 to time zone aware Timestamps.
4 participants