Skip to content

Default to UTC for date/time functions across PPL and SQL #3854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 23, 2025

Conversation

yuancu
Copy link
Contributor

@yuancu yuancu commented Jul 7, 2025

Description

Datetime functions in PPL/SQL has been using cluster's JVM time zone as the default timezone. For example, if the cluster resides in PST and it's 13:00:00 there, SELECT CURRENT_TIME() will return the the current timestamp as 13:00:00.

However, this creates a mismatch with the data stored in OpenSearch, as [dates are internally stored in the UTC format](https://docs.opensearch.org/docs/latest/field-types/supported-field-types/date/). Therefore, when comparing dates generated with functions with those stored in OpenSearch, there will be a mismatch in their timezone. Examples are detailed in #3725.

This PR unifies all date / time representations in PPL / SQL to UTC. As a result, SELECT CURRENT_TIME() will return the current time in UTC timezone; in query: ... | eval t = timestamp('2025-07-07 13:00:00') | where date > t, t will be regarded as 2025-07-07T13:00:00.000Z in UTC.

This will fix existing problems of comparing with now, where the current zoned timestamp in the cluster's time zone will be directly compared with an UTC timestamp from the index.

As the next step, we should allow users to specify timezone per query or cluster-wise.

Alternative solutions that I have considered

We can still use zoned time in PPL / SQL, but only make the conversion when interacting with the data from the index. This includes:

  • Convert the UTC date to cluster timezone when read from OpenSearch
  • When comparing with now() or literal, pushing down the time zone as a parameter as well (or convert the now / literal to UTC before pushing down).
  • ...

I did not implement this solution since users will read a data that's different from what they stored, and they have no control over it.

Comparison with other databases / languages

  • Most SQL databases have session controls. Date/time are returned in the session's time zone. If time zone not specified, dates will be returned in UTC time zone.
  • Elasticsearch supports time_zone argument in some DSL queries like range query, which will convert the date/time in the query to UTC before comparing with dates in the index. If not specified, all dates are regarded as in UTC time zone. E.g. SELECT NOW() will return a UTC time.
  • Splunk treats date/time literals that did not specify a time zone as in cluster's time zone. Date / time are converted to UNIX time for processing.

Related Issues

Resolves #3725

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -1303,7 +1303,7 @@ NOW
Description
>>>>>>>>>>>

Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the cluster time zone.
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the UTC time zone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this changes should impact any SYS* datetime functions. SYSDATE() should be expressed in the same timezone as the server is on.

We want to change the current_timestamp/now to UTC since we don't have a session in OS. But would not change the SYS* functions. ref https://database-heartbeat.com/2021/09/28/sysdate-vs-current_date-in-oracle-database/

Copy link
Contributor Author

@yuancu yuancu Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set sysdate to use cluster time zone.

Originally, sysdate only differs with now in that it returns execution timestamp instead of query start timestamp. Now it differs with now in one more way (time zone). I've made it clear in the doc. Is this change expected?

Two more functions that worry me are localtime and localtimestamp. They'll return UTC timestamp instead of local timestamp. This may be confusing for users. I think it's because we followed the function definition of MySql, where it says LOCALTIME and LOCALTIME() are synonyms for NOW(). However, since we have altered the definition of now, the local* functions no longer points to the local time zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okey as long as keep the behaviours of localtime() as same as now(). maybe a better way is marking local* functions deprecated since now we don't have any session management.

@@ -1303,7 +1303,7 @@ NOW
Description
>>>>>>>>>>>

Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the cluster time zone.
Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. The value is expressed in the UTC time zone.
`NOW()` returns a constant time that indicates the time at which the statement began to execute. This differs from the behavior for `SYSDATE() <#sysdate>`_, which returns the exact time at which it executes.
Copy link
Member

@LantaoJin LantaoJin Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need update user doc to highlight the new behaviours for any impacted functions. for example, now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Explicitly mentioned UTC time-zone for all now-like functions.

LantaoJin
LantaoJin previously approved these changes Jul 10, 2025
@penghuo
Copy link
Collaborator

penghuo commented Jul 14, 2025

This will be a break change, and may be counter-intuitive for some since users will get a timestamp that's different from their local timestamp.

Could we avoid breaking change by providing timezone setting? e.g. default value is jvm timezone.

As the next step, we should allow users to specify timezone per query or per session.

Timezone query paramater will overide timezone setting, right?

Default to UTC for date/time functions across PPL and SQL

I agree, timestamp Intenal representation should always be epoch value.
my proposal is apply timezone for display and parsing timestamp string literal only

@yuancu
Copy link
Contributor Author

yuancu commented Jul 15, 2025

Timezone query paramater will overide timezone setting, right?

Yes

My proposal is apply timezone for display and parsing timestamp string literal only

In this case, if a user feed in a timestamp string, should we regard it as a UTC timestamp or a timestamp at user's time zone?

I feel it will be more intuitive to treat it as in user's time zone. Since a user will get what they stores when they read it back from the index.

             -------convert---> | ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄|
user (local)                    | index (UTC)  |
             <----convert back- |______________|

The downside: It makes the user interface different from what's stored in the index. This may be a little weird for a query language.

Could we avoid breaking change by providing timezone setting? e.g. default value is jvm timezone.

I am considering adding such a setting. There may be three ways to achieve so:

  • setting per query
  • setting per session
  • a cluster-wide setting

They all have their merits and disadvantages. It may be too verbose or troublesome if a user has to specify a timezone per query. Enabling cluster-wise setting allows users to specify timezone once and use it forever, although clients from different time zones have to use the same time-zone set by the server. Using session-timezone / profile-timezone may be a great middle ground. However, introducing such an involved concept purely for timezone settings may be an overkill.

Of course, we can use a combination of them -- e.g. enable cluster-wide timezone setting, but also support overriding it with query parameters.

@@ -244,8 +244,9 @@ CURDATE
Description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a Note section at Top of datatime to describe UTC behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a block with the node directive. Do you think it is detailed enough?

@@ -1594,7 +1595,7 @@ Description
>>>>>>>>>>>

Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss[.nnnnnn]'.
SYSDATE() returns the time at which it executes. This differs from the behavior for `NOW() <#now>`_, which returns a constant time that indicates the time at which the statement began to execute.
SYSDATE() returns the time at which it executes in the cluster's time zone. This differs from the behavior for `NOW() <#now>`_, which returns a constant time that indicates the time at which the statement began to execute in the UTC time zone.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYSDATE() returns the time at which it executes in the cluster's time zone.

It should returns the current date in UTC.

Copy link
Contributor Author

@yuancu yuancu Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin argued that SYSDATE should return the current date at server's time zone, as how Oracle's SYSDATE does.

In MySql, SYSDATE does not differ with NOW w.r.t. timezone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the behaviours in Oracle Autonomous Database. The default time zone is UTC and by default calls to SYSDATE and SYSTIMESTAMP return the date and time in UTC. ref. let's update it to UTC. @yuancu

Copy link
Contributor Author

@yuancu yuancu Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the implementation.

Besides, I found a bug which may be annoying for SYSDATE users -- multiple calls to SYSDATE with the same parameter within a query will return the same value.

SYSDATE is expected to return different values when it is called within the same query.
E.g. sd1 and sd2 are different in this query: source=dates | eval sd1 = sysdate(5) | sort date | eval sd2 = sysdate(6) | fields sd1, sd2

sd1 sd2
2025-07-22 06:16:33.13648 2025-07-22 06:16:33.136559

However, this query gives the same result: source=dates | eval sd1 = sysdate(6) | sort date | eval sd2 = sysdate(6) | fields sd1, sd2

sd1 sd2
2025-07-22 06:19:07.006268 2025-07-22 06:19:07.006268

The physical plan gives a hint to the problem:

EnumerableCalc(expr#0=[{inputs}], expr#1=[6], expr#2=[SYSDATE($t1)], sd1=[$t2], sd2=[$t2])
  CalciteEnumerableIndexScan(table=[[OpenSearch, dates]], PushDownContext=[[PROJECT->[date], SORT->[{
  "date" : {
    "order" : "asc",
    "missing" : "_first"
  }
}]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"timeout":"1m","_source":{"includes":["date"],"excludes":[]},"sort":[{"date":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])

Both sd1 and sd2 point to $t2.

Will raise another issue to fix it.

timeConverted
.atZone(ZoneId.of(TimeZone.getDefault().getID()))
.withZoneSameInstant(ZoneId.of(timeZone));
timeConverted.atZone(ZoneOffset.UTC).withZoneSameInstant(ZoneId.of(timeZone));
FunctionExpression expr = DSL.datetime(DSL.literal(dt), DSL.literal(timeZone));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezone "America/Los_Angeles" is only for testing right? By default DSL.datetime always use UTC timezone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "America/Los_Angeles" is only for testing. DSL.datetime(datetime, timezone) converts the datetime (assumed to be in UTC) to the specified timezone. If there isn't a timezone specified -- DSL.datetime(datetime), it uses UTC timezone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One proposal related to this test: I think ExprTimestampValue should hold a LocalTimestamp instead of an Instant, since it just represents a timestamp without zone information.

For example, the function datetime(datetime, timezone) converts a timestamp to another timezone. But if representing the input and output timestamp with an Instant, it will remain the same. In the code, there are also many spoiler plates to add a timezone to a LocalDateTime in order to work with ExprTimestampValue.

yuancu added 2 commits July 22, 2025 13:47
…r timezone"

This reverts commit b7be897.

Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
@@ -8,6 +8,12 @@ Date and Time Functions
:local:
:depth: 1

.. note::

All PPL date and time functions use the UTC time zone by default, unless otherwise specified. Both input and output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless otherwise specified?

In which case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding. SYSDATE was an exception. Now we changed it to UTC as well. I'll rephrase the note to "All PPL date and time functions use the UTC time zone"

penghuo
penghuo previously approved these changes Jul 23, 2025
- remove expressions indicating PPL supporting other time zones

Signed-off-by: Yuanchun Shen <[email protected]>
@qianheng-aws qianheng-aws merged commit b3c0177 into opensearch-project:main Jul 23, 2025
25 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3854-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b3c01770a0b2cabadce6a3ce4a9134e70de72331
# Push it to GitHub
git push --set-upstream origin backport/backport-3854-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-3854-to-2.19-dev.

yuancu added a commit to yuancu/sql-plugin that referenced this pull request Jul 24, 2025
…-project#3854)

* Use UTC for all date/time related functions

Signed-off-by: Yuanchun Shen <[email protected]>

* Change zone ID to UTC in date/time tests

Signed-off-by: Yuanchun Shen <[email protected]>

* Update now's timezone to UTC in datetime documentation

Signed-off-by: Yuanchun Shen <[email protected]>

* Add a test to compare against UTC date

Signed-off-by: Yuanchun Shen <[email protected]>

* Revert changes to sysdate: make it return timestamp in cluster timezone

Signed-off-by: Yuanchun Shen <[email protected]>

* Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds)

Signed-off-by: Yuanchun Shen <[email protected]>

* Update datetime function documentations to highlight changes to UTC for now-like functions

Signed-off-by: Yuanchun Shen <[email protected]>

* Correct phrasing of sysdate and some other datetime function documentations

Signed-off-by: Yuanchun Shen <[email protected]>

* Add a note directive block at the top of datetime to describe the time zone behavior

Signed-off-by: Yuanchun Shen <[email protected]>

* Revert "Revert changes to sysdate: make it return timestamp in cluster timezone"

This reverts commit b7be897.

Signed-off-by: Yuanchun Shen <[email protected]>

* Update sysdate's description

Signed-off-by: Yuanchun Shen <[email protected]>

* Update datetime.rst

- remove expressions indicating PPL supporting other time zones

Signed-off-by: Yuanchun Shen <[email protected]>

* Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date

Signed-off-by: Yuanchun Shen <[email protected]>

* Restrict the format of convert_tz's argument

Signed-off-by: Yuanchun Shen <[email protected]>

---------

Signed-off-by: Yuanchun Shen <[email protected]>
(cherry picked from commit b3c0177)
yuancu added a commit to yuancu/sql-plugin that referenced this pull request Jul 24, 2025
…-project#3854)

* Use UTC for all date/time related functions

Signed-off-by: Yuanchun Shen <[email protected]>

* Change zone ID to UTC in date/time tests

Signed-off-by: Yuanchun Shen <[email protected]>

* Update now's timezone to UTC in datetime documentation

Signed-off-by: Yuanchun Shen <[email protected]>

* Add a test to compare against UTC date

Signed-off-by: Yuanchun Shen <[email protected]>

* Revert changes to sysdate: make it return timestamp in cluster timezone

Signed-off-by: Yuanchun Shen <[email protected]>

* Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds)

Signed-off-by: Yuanchun Shen <[email protected]>

* Update datetime function documentations to highlight changes to UTC for now-like functions

Signed-off-by: Yuanchun Shen <[email protected]>

* Correct phrasing of sysdate and some other datetime function documentations

Signed-off-by: Yuanchun Shen <[email protected]>

* Add a note directive block at the top of datetime to describe the time zone behavior

Signed-off-by: Yuanchun Shen <[email protected]>

* Revert "Revert changes to sysdate: make it return timestamp in cluster timezone"

This reverts commit b7be897.

Signed-off-by: Yuanchun Shen <[email protected]>

* Update sysdate's description

Signed-off-by: Yuanchun Shen <[email protected]>

* Update datetime.rst

- remove expressions indicating PPL supporting other time zones

Signed-off-by: Yuanchun Shen <[email protected]>

* Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date

Signed-off-by: Yuanchun Shen <[email protected]>

* Restrict the format of convert_tz's argument

Signed-off-by: Yuanchun Shen <[email protected]>

---------

Signed-off-by: Yuanchun Shen <[email protected]>
(cherry picked from commit b3c0177)
@yuancu yuancu deleted the utc-date branch July 25, 2025 07:41
penghuo pushed a commit that referenced this pull request Jul 25, 2025
)

* Use UTC for all date/time related functions



* Change zone ID to UTC in date/time tests



* Update now's timezone to UTC in datetime documentation



* Add a test to compare against UTC date



* Revert changes to sysdate: make it return timestamp in cluster timezone



* Correct utc_time and utc_timestamp to return time in HH:mm:ss format (intead of with random length of milliseconds)



* Update datetime function documentations to highlight changes to UTC for now-like functions



* Correct phrasing of sysdate and some other datetime function documentations



* Add a note directive block at the top of datetime to describe the time zone behavior



* Revert "Revert changes to sysdate: make it return timestamp in cluster timezone"

This reverts commit b7be897.



* Update sysdate's description



* Update datetime.rst

- remove expressions indicating PPL supporting other time zones



* Add STRING to allowed first parameter of convert_tz as it returns null for malformatted date



* Restrict the format of convert_tz's argument



---------


(cherry picked from commit b3c0177)

Signed-off-by: Yuanchun Shen <[email protected]>
@penghuo penghuo removed the breaking label Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] PPL date modifier function does not work as expected
4 participants