Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting #49427

Conversation

miland-db
Copy link
Contributor

@miland-db miland-db commented Jan 9, 2025

What changes were proposed in this pull request?

This pull request introduces the logic of error handling inside SQL Scripting language. Now, it is possible to:

  • declare conditions for specific SQL States (currently only valid in scope where they are defined)
  • declare handlers to catch and process errors that are risen during statement execution

Rules for selecting the most appropriate handler:

  • Named condition handlers are most specific.
  • SQLSTATE handlers are next in specificity.
  • Generic NOT FOUND and SQLEXCEPTION handlers are least specific.

Note: Handlers defined in the innermost compound statement where the exception was raised are considered.

Why are the changes needed?

The intent is to add the possibility for user to handle SQL errors in a custom defined way.

Limitations

  • Currently, only EXIT handler is supported. Support for CONTINUE handlers will come in the future.
  • It is only possible to declare condition in its full form by specifying SQLSTATE. Short form with default SQLSTATE is not yet supported.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

There are already existing test suites for SQL scripting that have been improved to test new functionalities:

  • SqlScriptingParserSuite
  • SqlScriptingExecutionSuite
  • SqlScriptingE2eSuite

Was this patch authored or co-authored using generative AI tooling?

No.

@miland-db miland-db marked this pull request as draft January 9, 2025 18:43
# Conflicts:
#	common/utils/src/main/resources/error/error-conditions.json
#	sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala
#	sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
@miland-db miland-db changed the title [SPARK-48353][SQL] Introduction of Error Handling mechanism GRAMMAR in SQL Scripting [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting Jan 13, 2025
@miland-db miland-db marked this pull request as ready for review January 14, 2025 13:20
@miland-db
Copy link
Contributor Author

Option(ctx.multipartIdentifier())
.foreach { conditionContext =>
val conditionNameParts = visitMultipartIdentifier(conditionContext)
val conditionNameString = conditionNameParts.mkString(".").toUpperCase(Locale.ROOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we only allow one name part, so this can be moved after the length check and do conditionNameParts.head.toUpperCase...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we also allow multipart names because of builtin conditions. I agree that we can move toUpperCase after the length check.

Copy link
Contributor

@cloud-fan cloud-fan Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see. Shall we check the user-defined error condition name to not contain special chars (only allow A-Z, digits and underscore)? In case it conflicts with the qualified builtin error condition name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add assertConditionName check in visitDeclareConditionStatementImpl to only allow A-B, digits and underscore.

Anyway, we check if the condition name in handler declaration is either user defined or builtin condition name, so for user defined stuff it is ok to just check it during condition declaration.

*/
case class ErrorCondition(
conditionName: String,
sqlState: String) extends CompoundPlanStatement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a CompoundPlanStatement? It seems like an intermediate data structure we use in the parser, and will end up as a Map[String, String] in CompoundBody

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need for this to be CompoundPlanStatement. I will update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I think about this, ErrorCondition is in a way a CompoundPlanStatement because we use it to declare condition and store the details there. In visitCompoundStatementImpl we are returning CompoundPlanStatement and ErrorCondition is one of them (also in the parser rule).

Looking at this, ErrorCondition (or DeclareConditionStatement) can either extend single statement, or we can leave it as it is now. What's your opinion @cloud-fan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to keep it as it is

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some nits

@miland-db
Copy link
Contributor Author

@cloud-fan Thank you for review(s)!

@davidm-db
Copy link
Contributor

davidm-db commented Feb 3, 2025

@cloud-fan Please don't merge until I take one last look, I was doing something locally last night and I just need to check first if the comments I had are applicable to this PR. I'll let you know once I'm done.

@davidm-db
Copy link
Contributor

Minor comments, otherwise LGTM.
I have one concern regarding all the case insensitive checks in the code - for example, in SqlScriptingLabelContext.checkLabels we are doing conversion to lowercase in the place of a check, rather than relying on propagating uppercase/lowercase strings and then having them in the correct case at the time of a check (which we are doing here). The approach we have with labels seems more robust and localized.
Here (in visitCompoundBodyImpl, for example, instead of propagating uppercase conditions, we would propagate just the input values, and then when adding to conditions map we can do a lowercase conversion and then each time we check if the condition is in the map, we do the conversion as well. This way, it's localized to the place where we do the checks and if something changes in the future we don't need to rely that someone will take care of this, it would simply work.
Thoughts?

@miland-db
Copy link
Contributor Author

  • Basically, we are doing uniqueness check for condition names in ExceptionHandlerTriggers.addUnique* methods where case sensitivity is centralized and strings are converted to the same casing .
  • When creating ErrorCondition object, we pass upper-case arguments.
  • We use upper-case when we check if condition is builtin Spark condition.

I agree more centralized approach should be used, but with current tests, I think we covered all possible checks in code to be case-insensitive. As a follow up, I will try to think of some more clever way of ensuring case-insensitivity because this can be a problem if somebody change or add new code without case-sensitivity in mind. Thanks for the comment!

@miland-db miland-db changed the title [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting Feb 3, 2025
@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 0f163a5 Feb 4, 2025
cloud-fan pushed a commit that referenced this pull request Feb 4, 2025
…L Scripting

### What changes were proposed in this pull request?
This pull request introduces the logic of error handling inside SQL Scripting language. Now, it is possible to:
- declare conditions for specific SQL States (currently only valid in scope where they are defined)
- declare handlers to catch and process errors that are risen during statement execution

Rules for selecting the most appropriate handler:
- Named condition handlers are most specific.
- SQLSTATE handlers are next in specificity.
- Generic NOT FOUND and SQLEXCEPTION handlers are least specific.

Note: Handlers defined in the innermost compound statement where the exception was raised are considered.

### Why are the changes needed?
The intent is to add the possibility for user to handle SQL errors in a custom defined way.

### Limitations

- Currently, only `EXIT` handler is supported. Support for `CONTINUE` handlers will come in the future.
- It is only possible to declare condition in its full form by specifying SQLSTATE. Short form with default SQLSTATE is not yet supported.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
There are already existing test suites for SQL scripting that have been improved to test new functionalities:

- `SqlScriptingParserSuite`
- `SqlScriptingExecutionSuite`
- `SqlScriptingE2eSuite`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49427 from miland-db/milan-dankovic_data/refactor-execution-4-condition-handlers.

Authored-by: Milan Dankovic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 0f163a5)
Signed-off-by: Wenchen Fan <[email protected]>
parameters = Map("sqlState" -> "12345"))
}

test("Specific condition takes precedence over sqlState") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 13 new tests in SqlScriptingExecutionSuiteandSqlScriptingE2eSuite` will fail to execute in non-ANSI mode and the NON-ANSI daily test failed:

image

we can reproduce it using the following command: SPARK_ANSI_SQL_MODE=false build/sbt "sql/testOnly org.apache.spark.sql.scripting.SqlScriptingE2eSuite org.apache.spark.sql.scripting.SqlScriptingExecutionSuite"

Do you have time to fix this issue? @miland-db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this. The issue is that zero division is not throwing an error when SPARK_ANSI_SQL_MODE=false. The solution can be to override this config for this suite, or to find some other error that has the same behavior regardless of the ANSI. Any thoughts @LuciferYang?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just enable ansi for the entire suite, by overriding def sparkConf. example:

class AvroV1Suite extends AvroSuite {
  override protected def sparkConf: SparkConf =
    super
      .sparkConf
      .set(SQLConf.USE_V1_SOURCE_LIST, "avro")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just enable ansi for the entire suite, by overriding def sparkConf. example:

class AvroV1Suite extends AvroSuite {
  override protected def sparkConf: SparkConf =
    super
      .sparkConf
      .set(SQLConf.USE_V1_SOURCE_LIST, "avro")

If the 13 failed cases are all due to the same reason, I think it's ok

MaxGekk pushed a commit that referenced this pull request Feb 7, 2025
…suites

### What changes were proposed in this pull request?
Keep ANSI enabled for SQL Scripting execution suites to enable proper testing of exception handling.

### Why are the changes needed?
13 new tests in `SqlScriptingExecutionSuite` and `SqlScriptingE2eSuite` introduced in #49427 will fail to execute in non-ANSI mode and the NON-ANSI daily test failed:
![image](https://github.com/user-attachments/assets/78a0b6fd-1b98-4b85-80e5-688e4b447cae)

If ANSI is off, code that should trigger exception fails silently (zero division does not throw). We want to keep ANSI behavior so that we can test exception handling properly.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49852 from miland-db/milan-dankovic_data/exception-handlers-followup-fix-test.

Authored-by: Milan Dankovic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
MaxGekk pushed a commit that referenced this pull request Feb 7, 2025
…suites

### What changes were proposed in this pull request?
Keep ANSI enabled for SQL Scripting execution suites to enable proper testing of exception handling.

### Why are the changes needed?
13 new tests in `SqlScriptingExecutionSuite` and `SqlScriptingE2eSuite` introduced in #49427 will fail to execute in non-ANSI mode and the NON-ANSI daily test failed:
![image](https://github.com/user-attachments/assets/78a0b6fd-1b98-4b85-80e5-688e4b447cae)

If ANSI is off, code that should trigger exception fails silently (zero division does not throw). We want to keep ANSI behavior so that we can test exception handling properly.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49852 from miland-db/milan-dankovic_data/exception-handlers-followup-fix-test.

Authored-by: Milan Dankovic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit ad13a88)
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants