Skip to content

Conversation

cgivre
Copy link
Contributor

@cgivre cgivre commented Oct 1, 2025

DRILL-8537: Bump Calcite to Version 1.38

Description

I am attempting a new approach and instead of bumping Calcite from 1.34 -> 1.40, I'm going to try this one version at a time and see how far we get.

Current Status:

  • Update to 1.35
  • Update to 1.36
  • Update to 1.37
  • Update to 1.38

Significant Changes in Calcite 1.35

  • Calcite 1.35 adds a literal_agg function which allows literals in aggregate queries.
  • Fix bugs in the ElasticSearch plugin which prevented certain configuration parameters from being passed to ElasticSearch.
  • Fix bugs introduced in Calcite 1.34 relating to certain date arithmetic functions
  • Update issues in Drill relating to Calcite's new VARDECIMAL handling.

Significant Changes in Calcite 1.36

There are no significant changes in Calcite 1.36.

Significant Changes in Calcite 1.37

  • SqlRandFunction Removed: The dedicated SqlRandFunction class was removed in Calcite 1.37. The RAND function is now implemented as a SqlBasicFunction in SqlStdOperatorTable.
  • INTERVAL Type Precision Calculation Changed: Calcite 1.37 changed how it calculates INTERVAL_PRECISION and COLUMN_SIZE for interval types, reporting values based on the full ISO-8601 representation rather than just the specified field precision.
  • Expression Simplification Performance Regression: Performance Issue: Calcite 1.37 exhibits exponential planning complexity when processing large IN clauses with expressions (not just literals), particularly during RexNode simplification and OR expression expansion.
  • Lambda Expressions: Calcite 1.37 adds lambda expression support. Drill does not support this as yet, but the planning is there.

Documentation

No user facing changes.

Testing

Ran existing unit tests

@cgivre cgivre self-assigned this Oct 1, 2025
@cgivre cgivre marked this pull request as draft October 1, 2025 13:45
@cgivre cgivre added code-cleanup dependencies calcite-update Changes required for updating the Calcite version backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Oct 1, 2025
@cgivre
Copy link
Contributor Author

cgivre commented Oct 6, 2025

Major Changes

1. Function Type Inference

EXTRACT Function

Problem: EXTRACT(SECOND) was returning BIGINT instead of DOUBLE, losing fractional seconds
Solution:

  • Created DrillCalciteSqlExtractWrapper with custom type inference
  • Updated DrillConvertletTable.extractConvertlet() to use TypeInferenceUtils.getSqlTypeNameForTimeUnit()
  • Returns DOUBLE for SECOND, BIGINT for other time units

Files Modified:

  • DrillCalciteSqlExtractWrapper.java (new)
  • DrillConvertletTable.java
  • DrillOperatorTable.java
  • TestFunctionsWithTypeExpoQueries.java

TIMESTAMPDIFF Function

Problem: Type mismatch between validation (BIGINT) and conversion (INTEGER)
Solution:

  • Created DrillCalciteSqlTimestampDiffWrapper for consistent BIGINT return type
  • Updated DrillConvertletTable.timestampDiffConvertlet() to use BIGINT
  • Both validation and conversion now consistently return BIGINT

Files Modified:

  • DrillCalciteSqlTimestampDiffWrapper.java (new)
  • DrillConvertletTable.java
  • DrillOperatorTable.java

TIMESTAMPADD Function

Problem: Calcite 1.35 was adding precision to DATE types, causing assertion errors
Solution:

  • Created DrillCalciteSqlTimestampAddWrapper with proper type logic
  • Only adds precision to TIMESTAMP and TIME types, not DATE
  • Updated DrillConvertletTable.timestampAddConvertlet() to skip precision for DATE

Files Modified:

  • DrillCalciteSqlTimestampAddWrapper.java (new)
  • DrillConvertletTable.java
  • DrillOperatorTable.java

2. Function Registration & Resolution

Vararg Functions (CONCAT, COALESCE, etc.)

Problem: Function resolution failures for functions with variable arguments
Solution: Enhanced LocalFunctionRegistry with sophisticated vararg matching logic

Files Modified:

  • LocalFunctionRegistry.java (+57 lines)

Niladic Special Functions (CURRENT_DATE, SESSION_USER, etc.)

Problem: Special functions not properly recognized in Calcite 1.35
Solution:

  • Created SpecialFunctionRewriter to handle niladic function transformations
  • Added explicit registration in DrillOperatorTable

Files Modified:

  • SpecialFunctionRewriter.java (+84 lines, new)
  • CountFunctionRewriter.java (+52 lines, new)
  • CharToVarcharRewriter.java (+61 lines, new)
  • DrillOperatorTable.java

3. COUNT(*) Handling

Problem: COUNT(*) type inference changed in Calcite 1.35
Solution: Created LiteralAggFunction for proper literal aggregate handling

Files Modified:

  • LiteralAggFunction.java (+192 lines, new)
  • DrillAggregateRel.java
  • AggPrelBase.java

4. Aggregate Cost Estimation

Problem: Deprecated RelOptCost constructor removed
Solution: Updated to use withAggCallCount() builder pattern

Files Modified:

  • DrillAggregateRel.java
  • AggPrelBase.java
  • DrillReduceAggregatesRule.java

5. TIMESTAMPADD Implementation

Problem: Complete signature change in Calcite 1.35
Solution:

  • Reimplemented TIMESTAMPADD function with new interval arithmetic
  • Added template-based code generation

Files Modified:

  • DateIntervalFunc.tdd
  • TimestampAddFunction.java (+203 lines, new)
  • DrillConvertletTable.java

6. Complex Writer Functions

Problem: FLATTEN, CONVERT_FROM, CONVERT_TO require ProjectRecordBatch context
Solution:

  • Modified DrillConstExecutor to skip constant folding for these functions
  • Updated dummy function implementations for Calcite 1.35 compatibility

Files Modified:

  • DrillConstExecutor.java
  • DummyConvertFrom.java
  • DummyConvertTo.java
  • DummyFlatten.java

7. FLATTEN in Aggregates Validation

Problem: FLATTEN only validated in COUNT, allowed in other aggregates
Solution: Changed validation from SqlCountAggFunction to SqlAggFunction to catch ALL aggregate types

Files Modified:

  • UnsupportedOperatorsVisitor.java

8. Error Handling & Validation

Prepared Statement Errors

Problem: Parse errors wrapped differently in RPC layer, appearing as SYSTEM instead of VALIDATION
Solution:

  • Added exception unwrapping in SqlConverter.parse()
  • Updated test expectations to match Calcite 1.35 behavior

Files Modified:

  • SqlConverter.java
  • TestPreparedStatementProvider.java

Invalid CAST Operations

Problem: Calcite 1.35 correctly rejects semantically invalid CAST(DATE as TIME)
Solution: Removed invalid test case with explanatory comment

Files Modified:

  • TestParquetFilterPushDownForDateTimeCasts.java

Test Updates

Core Module Tests

  • TestFunctionsQuery.java - Updated for DOUBLE return from EXTRACT(SECOND)
  • TestFunctionsWithTypeExpoQueries.java - Fixed type expectations (FLOAT8 for SECOND)
  • TestTimestampAddDiffFunctions.java - Updated for new type inference
  • TestCountStar.java - Fixed test infrastructure (changed to PlanTestBase)
  • TestParquetFilterPushDownForDateTimeCasts.java - Removed invalid cast test
  • TestAggregateFunctions.java - Updated aggregate type expectations
  • TestLiteralAggFunction.java (+241 lines, new) - Comprehensive literal aggregate testing

JDBC Storage Plugin Tests

  • TestJdbcPluginWithMySQLIT.java - Updated SQRT to return DOUBLE instead of BigDecimal
  • TestJdbcPluginWithMSSQL.java - Fixed schema expectations (BIGINT for COUNT, REQUIRED types)
  • TestJdbcPluginWithPostgres.java - Fixed schema expectations (BIGINT for COUNT, REQUIRED types)

Key Behavioral Changes

Type Inference

  1. EXTRACT(SECOND) now returns DOUBLE (was BIGINT) - supports fractional seconds
  2. TIMESTAMPDIFF returns BIGINT consistently
  3. COUNT(*) returns BIGINT (was INT in some contexts)
  4. SQRT and math functions consistently return DOUBLE
  5. Literal expressions and aggregates are REQUIRED (not OPTIONAL/nullable)

Function Resolution

  1. Vararg functions (CONCAT, COALESCE) now match multiple signatures
  2. Niladic functions (CURRENT_DATE, etc.) properly transformed with parentheses
  3. Special functions (COUNT(*), FLATTEN, etc.) have dedicated handling

Validation

  1. Stricter type checking between validation and conversion phases
  2. Invalid casts (DATE→TIME) now properly rejected
  3. FLATTEN properly rejected in ALL aggregate functions (not just COUNT)

Files Created

Core Engine

  1. DrillCalciteSqlExtractWrapper.java - Custom EXTRACT type inference
  2. DrillCalciteSqlTimestampAddWrapper.java - Custom TIMESTAMPADD type inference
  3. DrillCalciteSqlTimestampDiffWrapper.java - Custom TIMESTAMPDIFF type inference
  4. SpecialFunctionRewriter.java - Niladic function rewriting
  5. CountFunctionRewriter.java - COUNT(*) rewriting
  6. CharToVarcharRewriter.java - CHAR to VARCHAR conversion
  7. LiteralAggFunction.java - Literal aggregate handling
  8. TimestampAddFunction.java - New TIMESTAMPADD implementation

Tests

  1. TestLiteralAggFunction.java - Literal aggregate testing
  2. TestCountStar.java - COUNT(*) functionality testing

Migration Notes for Developers

If you use EXTRACT(SECOND):

  • Before: Returned BIGINT (lost fractional seconds)
  • After: Returns DOUBLE (preserves fractional seconds like 45.123)
  • Action: Update code expecting integer values to handle doubles

If you use COUNT(*):

  • Before: Could return INT in some contexts
  • After: Consistently returns BIGINT
  • Action: Update code expecting INT to handle BIGINT

If you use SQRT or math functions:

  • Before: Might return DECIMAL/BigDecimal in some contexts
  • After: Consistently returns DOUBLE
  • Action: Update type expectations in tests/code

If you cast DATE to TIME:

  • Before: Allowed (but semantically meaningless)
  • After: Properly rejected with validation error
  • Action: Remove invalid casts, use proper conversions

Compatibility

Backward Compatibility

  • Query Results: Mostly compatible, but type changes may affect downstream applications
  • API: Fully compatible
  • Storage Format: Fully compatible

Breaking Changes

  1. EXTRACT(SECOND) return type changed from BIGINT → DOUBLE
  2. COUNT(*) return type changed from INT → BIGINT in some contexts
  3. Math function return types more strictly DOUBLE (not DECIMAL)
  4. Invalid DATE→TIME casts now rejected

@cgivre cgivre changed the title DRILL-XXXX: Bump Calcite to Version 1.35 DRILL-8537: Bump Calcite to Version 1.35 Oct 9, 2025
@cgivre cgivre marked this pull request as ready for review October 9, 2025 02:38
@cgivre cgivre linked an issue Oct 9, 2025 that may be closed by this pull request
@cgivre cgivre changed the title DRILL-8537: Bump Calcite to Version 1.35 DRILL-8537: Bump Calcite to Version 1.36 Oct 13, 2025
@cgivre cgivre changed the title DRILL-8537: Bump Calcite to Version 1.36 DRILL-8537: Bump Calcite to Version 1.37 Oct 13, 2025
…1.37

This commit fixes the cartesian join error that occurs with INTERSECT/UNION queries
containing scalar subqueries like 'SELECT 1' in Calcite 1.37.0.

Changes to JoinUtils.java:

1. Enhanced isScalarSubquery() method to detect scalar subqueries represented as Values nodes:
   - Added support for org.apache.calcite.rel.logical.LogicalValues
   - Added support for org.apache.drill.exec.planner.common.DrillValuesRelBase
   - Both check if tuples.size() <= 1 to identify scalar subqueries

2. Modified checkCartesianJoin() method to allow cartesian joins with scalar subqueries:
   - Added hasScalarSubqueryInput() checks for both INNER and non-INNER joins
   - Returns false (not a problematic cartesian join) when a scalar subquery is detected
   - Allows nested loop joins for scalar subqueries instead of throwing errors

Reverted problematic changes:
- DrillRexBuilder.java: Removed ensureType() override that added casts for nullability
- DrillRelFactories.java: Removed nullability normalization in FilterFactory
- DefaultSqlHandler.java: Removed extra logging

Test results:
- TestSetOp tests (testIntersectCancellation, testUnionFilterPushDownOverOr): PASSING
- TestJoinNullable tests: PASSING
- No regression in other tests
@cgivre cgivre force-pushed the update_calcite_ai3 branch from 5decc96 to 7abb95f Compare October 16, 2025 00:40
@cgivre cgivre force-pushed the update_calcite_ai3 branch from b1599a1 to de8803d Compare October 16, 2025 03:05
@cgivre cgivre marked this pull request as draft October 17, 2025 04:05
@cgivre cgivre changed the title DRILL-8537: Bump Calcite to Version 1.37 DRILL-8537: Bump Calcite to Version 1.38 Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there calcite-update Changes required for updating the Calcite version code-cleanup dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Elasticsearch storage plugin error on connect to elastic cloud

1 participant