Description
In this issue, we will list and track all tasks for ANSI mode support.
There are two Spark configurations directly related to ANSI usage.
spark.sql.ansi.enabled (default is true since Spark 4.0)
spark.sql.storeAssignmentPolicy (default is ANSI since Spark 3.0)
Tasks
Basic
1. Type Casting Functions (ANSI Strict)
2. Arithmetic Functions (ANSI Overflow Check)
3. Date/Time Functions (ANSI Validation)
Datetime expressions: ToUnixTimestamp, UnixTimestamp, GetTimestamp, TryToTimestampExpressionBuilder, NextDay, DateAddInterval, ParseToDate, TryToDateExpressionBuilder, ParseToTimestamp, MakeDate, TryMakeTimestampLTZExpressionBuilder, MakeTimestamp
4. String to Numeric Conversion (ANSI Strict)
5. Aggregation Functions (ANSI Overflow)
SUM, AVG, VAR_POP, VAR_SAMP, STDDEV_POP, STDDEV_SAMP
In ANSI mode: overflow checks during accumulation.
TRY_SUM (Spark 3.4+)
Returns NULL on overflow instead of error.
6. Window Functions (ANSI Overflow)
Same overflow checks apply in window operations:
SUM(...) OVER(...)
AVG(...) OVER(...)
7. ANSI SQL Compliant String Functions
SUBSTRING / SUBSTR
ANSI SQL standard argument order: SUBSTRING(str FROM start [FOR len])
Also supports classic form: SUBSTRING(str, start, len)
TRIM
ANSI syntax: TRIM(LEADING '0' FROM col)
Also TRIM(BOTH ...), TRIM(TRAILING ...)
OVERLAY
ANSI SQL string replacement:
OVERLAY(string PLACING replacement FROM start [FOR length])
See Spark ANSI compliance: https://github.com/apache/spark/blob/v4.0.0/docs/sql-ref-ansi-compliance.md
Related discussion: #4740 .
facebookincubator/velox#3869
Description
In this issue, we will list and track all tasks for ANSI mode support.
There are two Spark configurations directly related to ANSI usage.
spark.sql.ansi.enabled(default is true since Spark 4.0)spark.sql.storeAssignmentPolicy(default is ANSI since Spark 3.0)Tasks
Basic
1. Type Casting Functions (ANSI Strict)
cast string to boolean (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L701
Cast decimal to string (@Mariamalmesfer))
// In ANSI mode, Spark always use plain string representation on casting Decimal values
// as strings. Otherwise, the casting is using
BigDecimal.toStringwhich may use scientific// notation if an exponent is needed.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L678
cast string to timestamp (@infvg )
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L733
cast String to timestampNTZ
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L775
cast float/double to timestamp (@infvg)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L758
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L765
cast string to date (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L811
cast string to time (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L826
The implementation for codegen, assume equivalent with the above link:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1493
cast string to long/int/short/byte (@malinjawi)
As one example, here is the related code for long type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L883
cast NumericType to long/int/short/byte (@minni31)
As one example, here is the related code for long type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L896
cast timestamp to int/short/byte
As one example, here is the related code for int type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L933
cast time to short/byte (requires TimeType support)
As one example, here is the related code for short type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L980
cast several types to decimal
ANSI controls the overflow behavior in changePrecision
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1103
cast string to double/float
ANSI controls the behavior in handling incorrect number format
As one example, here is the related code for double type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1159
2. Arithmetic Functions (ANSI Overflow Check)
A base type: AnsiIntervalType (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/api/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L168
Unary expressions like Abs, UnaryMinus (@malinjawi)
The ANSI config controls failOnError.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L152C35-L152C46
Binary arithmetic expressions using BinaryArithmetic as base, such as add, divide, multiply, etc. (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L209
3. Date/Time Functions (ANSI Validation)
4. String to Numeric Conversion (ANSI Strict)
String expressions: Elt
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L286
Collection expression Size
Its legacySizeOfNull is impacted by ANSI config.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L118
Collection expression ElementAt
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L2622
conv
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L451
round functions: Round, BRound, RoundCeil, RoundFloor
As one example, see how to round to ByteType with ANSI enabled:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1579
STORE_ASSIGNMENT_POLICY defaults to ANSI
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4487
5. Aggregation Functions (ANSI Overflow)
SUM,AVG,VAR_POP,VAR_SAMP,STDDEV_POP,STDDEV_SAMPTRY_SUM(Spark 3.4+)NULLon overflow instead of error.6. Window Functions (ANSI Overflow)
Same overflow checks apply in window operations:
SUM(...) OVER(...)AVG(...) OVER(...)7. ANSI SQL Compliant String Functions
SUBSTRING/SUBSTRSUBSTRING(str FROM start [FOR len])SUBSTRING(str, start, len)TRIMTRIM(LEADING '0' FROM col)Also
TRIM(BOTH ...),TRIM(TRAILING ...)OVERLAYOVERLAY(string PLACING replacement FROM start [FOR length])See Spark ANSI compliance: https://github.com/apache/spark/blob/v4.0.0/docs/sql-ref-ansi-compliance.md
Related discussion: #4740.
facebookincubator/velox#3869