Skip to content

Support casting date literal to timestamp #3831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 22, 2025

Conversation

yuancu
Copy link
Contributor

@yuancu yuancu commented Jun 30, 2025

Description

Before this PR, only the following casts are supported

date time timestamp
date str
time str
timestamp str

With this PR:

date time timestamp
date str
time str
timestamp str

The castings that remain impossible are in place to avoid swallowing errors silently.

An example case it solves
... | where date_time > '1950-10-11' | fields date_time

  • Before this PR
    • calcite
      EnumerableCalc(expr#0=[{inputs}], expr#1=['1950-10-11':VARCHAR], expr#2=[TIMESTAMP($t1)], expr#3=[>($t0, $t2)], date_time=[$t0], $condition=[$t3])
        CalciteEnumerableIndexScan(table=[[OpenSearch, dates]], PushDownContext=[[PROJECT->[date_time]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"timeout":"1m","_source":{"includes":["date_time"],"excludes":[]}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
      
      Run into follow exception when executed
      {
        "error": {
          "reason": "There was internal problem at backend",
          "details": "java.sql.SQLException: exception while executing query: timestamp:1950-10-11 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'",
          "type": "RuntimeException"
        },
        "status": 500
      }
    • v2
      {
       "error": {
         "reason": "Invalid Query",
         "details": "timestamp:1950-10-11 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'",
         "type": "SemanticCheckException"
       },
       "status": 400
      }
      
  • After this PR
    • calcite
      CalciteEnumerableIndexScan(table=[[OpenSearch, dates]], PushDownContext=[[PROJECT->[date_time], FILTER->>($0, '1950-10-11 00:00:00')], OpenSearchRequestBuilder(sourceBuilder={"from":0,"timeout":"1m","query":{"range":{"date_time":{"from":"1950-10-11T00:00:00.000Z","to":null,"include_lower":false,"include_upper":true,"boost":1.0}}},"_source":{"includes":["date_time"],"excludes":[]},"sort":[{"_doc":{"order":"asc"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
      
    • v2
      {
          "root": {
              "name": "ProjectOperator",
              "description": {
                  "fields": "[date_time]"
              },
              "children": [
                  {
                      "name": "OpenSearchIndexScan",
                      "description": {
                          "request": "OpenSearchQueryRequest(indexName=dates, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"query\":{\"range\":{\"date_time\":{\"from\":\"1950-10-11T00:00:00.000Z\",\"to\":null,\"include_lower\":false,\"include_upper\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"date_time\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, needClean=true, searchDone=false, pitId=*, cursorKeepAlive=1m, searchAfter=null, searchResponse=null)"
                      },
                      "children": []
                  }
              ]
          }
      }
      

Callout: The exception thrown when failed to parse a date/time string is changed to ExpressionEvaluationException (from SemanticCheckException). Users relying on the exception type will be affected.

Implementation Notes

Concerns

  • Calling UDFs for casting may render it impossible to e.g. pushdown
  • Calling UDFs for casting may make the execution slower

Related Issues

Resolves #3728

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuancu added 3 commits June 30, 2025 17:59
date str -> timestamp
timestamp str -> date
timestamp str -> time

Signed-off-by: Yuanchun Shen <[email protected]>
@yuancu yuancu changed the title Support casting date string to timestamp [BUG] Support casting date string to timestamp Jul 1, 2025
@yuancu yuancu changed the title [BUG] Support casting date string to timestamp Support casting date string to timestamp Jul 1, 2025
@LantaoJin LantaoJin added bug Something isn't working calcite calcite migration releated labels Jul 1, 2025
rows("New York"),
rows("Ontario"),
rows("Quebec"),
rows((Object) null),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job to support passing null in verifyDataRows

"source=%s | eval a = cast('09:07:42' as DATE) | fields a",
TEST_INDEX_DATE_FORMATS)));

verifyErrorMessageContains(t, "date:09:07:42 in unsupported format, please use 'yyyy-MM-dd'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we are not aligning with Spark, so this is the behviour of v2 or mysql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the behavior of v2.

In MySQL, cast a time string to date will return NULL; in some other databases like Postgres, it raises an error.

@yuancu yuancu marked this pull request as ready for review July 1, 2025 09:34
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling UDFs for casting may render it impossible to e.g. pushdown

Is this still a concern if expression pushdown is supported?

Meanwhile in V2, even though expression pushdown is available, I recall there is performance degradation when CAST is involved—queries may fall back from DSL search to expression script evaluation. Probably we can double check and see if there’s a possible workaround in V3.

@yuancu yuancu changed the title Support casting date string to timestamp Support casting date literal to timestamp Jul 2, 2025
@dai-chen
Copy link
Collaborator

dai-chen commented Jul 3, 2025

There happens to be a new issue #3842 related.

@yuancu
Copy link
Contributor Author

yuancu commented Jul 4, 2025

Is this still a concern if expression pushdown is supported?

It should not be a issue when script pushdown is supported. Besides, if it is a literal that has to be cast, it can be pushed-down as a range query with #3798. I added an example with physical plan in the PR description to demonstrate this.

Probably we can double check and see if there’s a possible workaround in V3.

#3798 circumvents script push-down by converting the string literal to another DSL-recognizable literal then pushing down.

.toInstant();
} catch (DateTimeParseException e) {
DateTimeParser.parseDateOrTimestamp(timestamp).atZone(ZoneOffset.UTC).toInstant();
} catch (SemanticCheckException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SemanticCheckException is inappropriate here. This issue isn't related to semantic at all. Should be something like IllegalArgumentException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ExpressionEvaluationException when failed parsing datetime literals/strings.

.atZone(ZoneOffset.UTC)
.toInstant();
} catch (DateTimeParseException e) {
DateTimeParser.parseDateOrTimestamp(timestamp).atZone(ZoneOffset.UTC).toInstant();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we support cast time str -> timestamp in the future? Maybe replace DATE_TIME_FORMATTER_VARIABLE_NANOS with DATE_TIME_FORMATTER_VARIABLE_NANOS_OPTIONAL directly if that's the case.

And I wonder if using DateTimeFormatter with multiple patterns should be more efficient than DateTimeParser.parseDateOrTimestamp which try to parseDate and fallback to parseTimeStamp when throwing exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by replacing parse*Or* calls with new DateTimeFormatters

@qianheng-aws
Copy link
Collaborator

Not related to this issue, I found there are some inefficient implementation for the date related functions, like branch selection at runtime for TIMESTAMP which should be determined in compiling phase based on the type of args, and redundant transformation between String and ExprStringValue for DATE. I don't check all of them.

@LantaoJin LantaoJin mentioned this pull request Jul 10, 2025
7 tasks
@yuancu
Copy link
Contributor Author

yuancu commented Jul 10, 2025

One problem I discovered during testing:

Errors thrown during executing PPL with Calcite will often result in 500 error with Calcite.

For example query source=t0001 | where @timestamp > '12:00' has the following response with Calcite:

{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "java.sql.SQLException: exception while executing query: timestamp:12:00 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'",
    "type": "RuntimeException"
  },
  "status": 500
}

But in v2, it's:

{
  "error": {
    "reason": "Invalid Query",
    "details": "timestamp:12:00 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

This is because exceptions thrown during statement.executeQuery() will be wrapped with SQLException (see AvaticaConnection.java#L579), which is then captured by us and wrapped with a RuntimeException:

  try (PreparedStatement statement = OpenSearchRelRunners.run(context, rel)) {
    ResultSet result = statement.executeQuery();
    buildResultSet(result, rel.getRowType(), context.querySizeLimit, listener);
  } catch (SQLException e) {
    throw new RuntimeException(e);
  }

A simple fix: retrieving the wrapped exception from the SQLException and re-throw it. Should I implement this?

example stacktrace
[2025-07-10T14:28:48,782][ERROR][o.o.s.p.r.RestPPLQueryAction] [7cf34de73b85] Error happened during query handling
java.lang.RuntimeException: java.sql.SQLException: exception while executing query: timestamp:12:00 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'
	at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$6(OpenSearchExecutionEngine.java:203) ~[?:?]
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$7(OpenSearchExecutionEngine.java:196) ~[?:?]
	at org.opensearch.sql.opensearch.client.OpenSearchNodeClient.schedule(OpenSearchNodeClient.java:193) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.execute(OpenSearchExecutionEngine.java:194) ~[?:?]
	at org.opensearch.sql.executor.QueryService.lambda$executeWithCalcite$0(QueryService.java:107) ~[?:?]
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
	at org.opensearch.sql.executor.QueryService.executeWithCalcite(QueryService.java:96) ~[?:?]
	at org.opensearch.sql.executor.QueryService.execute(QueryService.java:72) ~[?:?]
	at org.opensearch.sql.executor.execution.QueryPlan.execute(QueryPlan.java:69) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$submit$0(OpenSearchQueryManager.java:31) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$withCurrentContext$1(OpenSearchQueryManager.java:45) ~[?:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916) ~[opensearch-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.sql.SQLException: exception while executing query: timestamp:12:00 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'
	at org.apache.calcite.avatica.Helper.createException(Helper.java:56) ~[?:?]
	at org.apache.calcite.avatica.Helper.createException(Helper.java:41) ~[?:?]
	at org.apache.calcite.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:579) ~[?:?]
	at org.apache.calcite.avatica.AvaticaPreparedStatement.executeQuery(AvaticaPreparedStatement.java:137) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$6(OpenSearchExecutionEngine.java:200) ~[?:?]
	... 15 more
Caused by: org.opensearch.sql.exception.ExpressionEvaluationException: timestamp:12:00 in unsupported format, please use 'yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]'
	at org.opensearch.sql.data.model.ExprTimestampValue.<init>(ExprTimestampValue.java:43) ~[?:?]
	at org.opensearch.sql.calcite.utils.datetime.DateTimeConversionUtils.convertToTimestampValue(DateTimeConversionUtils.java:62) ~[?:?]
	at org.opensearch.sql.expression.function.udf.datetime.TimestampFunction.timestamp(TimestampFunction.java:75) ~[?:?]
	at Baz$1$1.moveNext(Unknown Source) ~[?:?]
	at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.<init>(Linq4j.java:666) ~[?:?]
	at org.apache.calcite.linq4j.Linq4j.enumeratorIterator(Linq4j.java:99) ~[?:?]
	at org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33) ~[?:?]
	at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:83) ~[?:?]
	at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:186) ~[?:?]
	at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:64) ~[?:?]
	at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:43) ~[?:?]
	at org.apache.calcite.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:575) ~[?:?]
	at org.apache.calcite.avatica.AvaticaPreparedStatement.executeQuery(AvaticaPreparedStatement.java:137) ~[?:?]
	at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$6(OpenSearchExecutionEngine.java:200) ~[?:?]
	... 15 more

@yuancu
Copy link
Contributor Author

yuancu commented Jul 10, 2025

Not related to this issue, I found there are some inefficient implementation for the date related functions, like branch selection at runtime for TIMESTAMP which should be determined in compiling phase based on the type of args, and redundant transformation between String and ExprStringValue for DATE. I don't check all of them.

I spotted a few for functions under java/org/opensearch/sql/expression/function/udf/datetime. I can fix them in another new PR.

Besides, almost all datetime functions convert inputs to ExprValues to reuse v2's implementation. I can eliminate them to enhance efficiency if the anticipated improvement is significant.

@yuancu yuancu requested review from LantaoJin and qianheng-aws July 10, 2025 08:36
@LantaoJin LantaoJin merged commit 30aba65 into opensearch-project:main Jul 22, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] CAST does not support casting date/time string to timestamp
4 participants