-
Notifications
You must be signed in to change notification settings - Fork 164
Default to UTC for date/time functions across PPL and SQL #3854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
@@ -1303,7 +1303,7 @@ NOW | |||
Description | |||
>>>>>>>>>>> | |||
|
|||
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the cluster time zone. | |||
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the UTC time zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this changes should impact any SYS* datetime functions. SYSDATE() should be expressed in the same timezone as the server is on.
We want to change the current_timestamp/now to UTC since we don't have a session in OS. But would not change the SYS* functions. ref https://database-heartbeat.com/2021/09/28/sysdate-vs-current_date-in-oracle-database/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set sysdate
to use cluster time zone.
Originally, sysdate
only differs with now
in that it returns execution timestamp instead of query start timestamp. Now it differs with now
in one more way (time zone). I've made it clear in the doc. Is this change expected?
Two more functions that worry me are localtime
and localtimestamp
. They'll return UTC timestamp instead of local timestamp. This may be confusing for users. I think it's because we followed the function definition of MySql, where it says LOCALTIME and LOCALTIME() are synonyms for NOW(). However, since we have altered the definition of now
, the local*
functions no longer points to the local time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okey as long as keep the behaviours of localtime() as same as now(). maybe a better way is marking local* functions deprecated since now we don't have any session management.
@@ -1303,7 +1303,7 @@ NOW | |||
Description | |||
>>>>>>>>>>> | |||
|
|||
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the cluster time zone. | |||
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the UTC time zone. | |||
`NOW()` returns a constant time that indicates the time at which the statement began to execute. This differs from the behavior for `SYSDATE() <#sysdate>`_, which returns the exact time at which it executes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need update user doc to highlight the new behaviours for any impacted functions. for example, now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Explicitly mentioned UTC time-zone for all now-like functions.
Signed-off-by: Yuanchun Shen <[email protected]>
…(intead of with random length of milliseconds) Signed-off-by: Yuanchun Shen <[email protected]>
Could we avoid breaking change by providing timezone setting? e.g. default value is jvm timezone.
Timezone query paramater will overide timezone setting, right?
I agree, timestamp Intenal representation should always be epoch value. |
Yes
In this case, if a user feed in a timestamp string, should we regard it as a UTC timestamp or a timestamp at user's time zone? I feel it will be more intuitive to treat it as in user's time zone. Since a user will get what they stores when they read it back from the index.
The downside: It makes the user interface different from what's stored in the index. This may be a little weird for a query language.
I am considering adding such a setting. There may be three ways to achieve so:
They all have their merits and disadvantages. It may be too verbose or troublesome if a user has to specify a timezone per query. Enabling cluster-wise setting allows users to specify timezone once and use it forever, although clients from different time zones have to use the same time-zone set by the server. Using session-timezone / profile-timezone may be a great middle ground. However, introducing such an involved concept purely for timezone settings may be an overkill. Of course, we can use a combination of them -- e.g. enable cluster-wide timezone setting, but also support overriding it with query parameters. |
@@ -244,8 +244,9 @@ CURDATE | |||
Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a Note section at Top of datatime to describe UTC behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a block with the node directive. Do you think it is detailed enough?
docs/user/ppl/functions/datetime.rst
Outdated
@@ -1594,7 +1595,7 @@ Description | |||
>>>>>>>>>>> | |||
|
|||
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss[.nnnnnn]'. | |||
SYSDATE() returns the time at which it executes. This differs from the behavior for `NOW() <#now>`_, which returns a constant time that indicates the time at which the statement began to execute. | |||
SYSDATE() returns the time at which it executes in the cluster's time zone. This differs from the behavior for `NOW() <#now>`_, which returns a constant time that indicates the time at which the statement began to execute in the UTC time zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYSDATE() returns the time at which it executes in the cluster's time zone.
It should returns the current date in UTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin argued that SYSDATE
should return the current date at server's time zone, as how Oracle's SYSDATE does.
In MySql, SYSDATE does not differ with NOW w.r.t. timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the implementation.
Besides, I found a bug which may be annoying for SYSDATE
users -- multiple calls to SYSDATE
with the same parameter within a query will return the same value.
SYSDATE
is expected to return different values when it is called within the same query.
E.g. sd1
and sd2
are different in this query: source=dates | eval sd1 = sysdate(5) | sort date | eval sd2 = sysdate(6) | fields sd1, sd2
sd1 | sd2 |
---|---|
2025-07-22 06:16:33.13648 | 2025-07-22 06:16:33.136559 |
However, this query gives the same result: source=dates | eval sd1 = sysdate(6) | sort date | eval sd2 = sysdate(6) | fields sd1, sd2
sd1 | sd2 |
---|---|
2025-07-22 06:19:07.006268 | 2025-07-22 06:19:07.006268 |
The physical plan gives a hint to the problem:
EnumerableCalc(expr#0=[{inputs}], expr#1=[6], expr#2=[SYSDATE($t1)], sd1=[$t2], sd2=[$t2])
CalciteEnumerableIndexScan(table=[[OpenSearch, dates]], PushDownContext=[[PROJECT->[date], SORT->[{
"date" : {
"order" : "asc",
"missing" : "_first"
}
}]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"timeout":"1m","_source":{"includes":["date"],"excludes":[]},"sort":[{"date":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Both sd1
and sd2
point to $t2
.
Will raise another issue to fix it.
timeConverted | ||
.atZone(ZoneId.of(TimeZone.getDefault().getID())) | ||
.withZoneSameInstant(ZoneId.of(timeZone)); | ||
timeConverted.atZone(ZoneOffset.UTC).withZoneSameInstant(ZoneId.of(timeZone)); | ||
FunctionExpression expr = DSL.datetime(DSL.literal(dt), DSL.literal(timeZone)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezone "America/Los_Angeles" is only for testing right? By default DSL.datetime always use UTC timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "America/Los_Angeles" is only for testing. DSL.datetime(datetime, timezone)
converts the datetime (assumed to be in UTC) to the specified timezone. If there isn't a timezone specified -- DSL.datetime(datetime)
, it uses UTC timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One proposal related to this test: I think ExprTimestampValue
should hold a LocalTimestamp
instead of an Instant
, since it just represents a timestamp without zone information.
For example, the function datetime(datetime, timezone)
converts a timestamp to another timezone. But if representing the input and output timestamp with an Instant
, it will remain the same. In the code, there are also many spoiler plates to add a timezone to a LocalDateTime in order to work with ExprTimestampValue.
…ations Signed-off-by: Yuanchun Shen <[email protected]>
…e zone behavior Signed-off-by: Yuanchun Shen <[email protected]>
…r timezone" This reverts commit b7be897. Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
docs/user/ppl/functions/datetime.rst
Outdated
@@ -8,6 +8,12 @@ Date and Time Functions | |||
:local: | |||
:depth: 1 | |||
|
|||
.. note:: | |||
|
|||
All PPL date and time functions use the UTC time zone by default, unless otherwise specified. Both input and output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless otherwise specified?
In which case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding. SYSDATE
was an exception. Now we changed it to UTC as well. I'll rephrase the note to "All PPL date and time functions use the UTC time zone"
- remove expressions indicating PPL supporting other time zones Signed-off-by: Yuanchun Shen <[email protected]>
…l for malformatted date Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3854-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b3c01770a0b2cabadce6a3ce4a9134e70de72331
# Push it to GitHub
git push --set-upstream origin backport/backport-3854-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev Then, create a pull request where the |
…-project#3854) * Use UTC for all date/time related functions Signed-off-by: Yuanchun Shen <[email protected]> * Change zone ID to UTC in date/time tests Signed-off-by: Yuanchun Shen <[email protected]> * Update now's timezone to UTC in datetime documentation Signed-off-by: Yuanchun Shen <[email protected]> * Add a test to compare against UTC date Signed-off-by: Yuanchun Shen <[email protected]> * Revert changes to sysdate: make it return timestamp in cluster timezone Signed-off-by: Yuanchun Shen <[email protected]> * Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds) Signed-off-by: Yuanchun Shen <[email protected]> * Update datetime function documentations to highlight changes to UTC for now-like functions Signed-off-by: Yuanchun Shen <[email protected]> * Correct phrasing of sysdate and some other datetime function documentations Signed-off-by: Yuanchun Shen <[email protected]> * Add a note directive block at the top of datetime to describe the time zone behavior Signed-off-by: Yuanchun Shen <[email protected]> * Revert "Revert changes to sysdate: make it return timestamp in cluster timezone" This reverts commit b7be897. Signed-off-by: Yuanchun Shen <[email protected]> * Update sysdate's description Signed-off-by: Yuanchun Shen <[email protected]> * Update datetime.rst - remove expressions indicating PPL supporting other time zones Signed-off-by: Yuanchun Shen <[email protected]> * Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date Signed-off-by: Yuanchun Shen <[email protected]> * Restrict the format of convert_tz's argument Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]> (cherry picked from commit b3c0177)
…-project#3854) * Use UTC for all date/time related functions Signed-off-by: Yuanchun Shen <[email protected]> * Change zone ID to UTC in date/time tests Signed-off-by: Yuanchun Shen <[email protected]> * Update now's timezone to UTC in datetime documentation Signed-off-by: Yuanchun Shen <[email protected]> * Add a test to compare against UTC date Signed-off-by: Yuanchun Shen <[email protected]> * Revert changes to sysdate: make it return timestamp in cluster timezone Signed-off-by: Yuanchun Shen <[email protected]> * Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds) Signed-off-by: Yuanchun Shen <[email protected]> * Update datetime function documentations to highlight changes to UTC for now-like functions Signed-off-by: Yuanchun Shen <[email protected]> * Correct phrasing of sysdate and some other datetime function documentations Signed-off-by: Yuanchun Shen <[email protected]> * Add a note directive block at the top of datetime to describe the time zone behavior Signed-off-by: Yuanchun Shen <[email protected]> * Revert "Revert changes to sysdate: make it return timestamp in cluster timezone" This reverts commit b7be897. Signed-off-by: Yuanchun Shen <[email protected]> * Update sysdate's description Signed-off-by: Yuanchun Shen <[email protected]> * Update datetime.rst - remove expressions indicating PPL supporting other time zones Signed-off-by: Yuanchun Shen <[email protected]> * Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date Signed-off-by: Yuanchun Shen <[email protected]> * Restrict the format of convert_tz's argument Signed-off-by: Yuanchun Shen <[email protected]> --------- Signed-off-by: Yuanchun Shen <[email protected]> (cherry picked from commit b3c0177)
) * Use UTC for all date/time related functions * Change zone ID to UTC in date/time tests * Update now's timezone to UTC in datetime documentation * Add a test to compare against UTC date * Revert changes to sysdate: make it return timestamp in cluster timezone * Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds) * Update datetime function documentations to highlight changes to UTC for now-like functions * Correct phrasing of sysdate and some other datetime function documentations * Add a note directive block at the top of datetime to describe the time zone behavior * Revert "Revert changes to sysdate: make it return timestamp in cluster timezone" This reverts commit b7be897. * Update sysdate's description * Update datetime.rst - remove expressions indicating PPL supporting other time zones * Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date * Restrict the format of convert_tz's argument --------- (cherry picked from commit b3c0177) Signed-off-by: Yuanchun Shen <[email protected]>
Description
Datetime functions in PPL/SQL has been using cluster's JVM time zone as the default timezone. For example, if the cluster resides in PST and it's 13:00:00 there,
SELECT CURRENT_TIME()
will return the the current timestamp as13:00:00
.However, this creates a mismatch with the data stored in OpenSearch, as
[dates are internally stored in the UTC format](https://docs.opensearch.org/docs/latest/field-types/supported-field-types/date/)
. Therefore, when comparing dates generated with functions with those stored in OpenSearch, there will be a mismatch in their timezone. Examples are detailed in #3725.This PR unifies all date / time representations in PPL / SQL to UTC. As a result,
SELECT CURRENT_TIME()
will return the current time in UTC timezone; in query:... | eval t = timestamp('2025-07-07 13:00:00') | where date > t
,t
will be regarded as2025-07-07T13:00:00.000Z
in UTC.This will fix existing problems of comparing with now, where the current zoned timestamp in the cluster's time zone will be directly compared with an UTC timestamp from the index.
As the next step, we should allow users to specify timezone per query or cluster-wise.
Alternative solutions that I have considered
We can still use zoned time in PPL / SQL, but only make the conversion when interacting with the data from the index. This includes:
now()
or literal, pushing down the time zone as a parameter as well (or convert the now / literal to UTC before pushing down).I did not implement this solution since users will read a data that's different from what they stored, and they have no control over it.
Comparison with other databases / languages
time_zone
argument in some DSL queries like range query, which will convert the date/time in the query to UTC before comparing with dates in the index. If not specified, all dates are regarded as in UTC time zone. E.g.SELECT NOW()
will return a UTC time.Related Issues
Resolves #3725
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.