-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48530][SQL] Support for local variables in SQL Scripting
### What changes were proposed in this pull request? This pull request introduces support for local variables in SQL scripting. #### Behavior: Local variables are declared in the headers of compound bodies, and are bound to it's scope. Variables of the same name are allowed in nested scopes, where the innermost variable will be resolved. Optionally, a local variable can be qualified with the label of the compound body in which it was declared, which would allow accessing variables which are not the innermost in the current scope. Local variables have resolution priority over session variables, session variable resolution is attempted after local variable resolution. The exception to this is with fully qualified session variables, in the format `system.session.<varName>` or `session.<varName>`. System and session are forbidden for use as compound body labels. Local variables must not be qualified on declaration, can be set using `SET VAR` and cannot be `DROPPED`. They also should not be allowed to be declared with `DECLARE OR REPLACE`, however this is not implemented on this PR as `FOR` statement relies on this behavior. `FOR` statement must be updated in a separate PR to use proper local variables, as the current implementation is simulating them using session variables. #### Implementation notes: As core depends on catalyst, it's impossible to import code from core(where most of SQL scripting implementation is located) to catalyst. To solve this a trait `VariableManager` is introduced, which is then implemented in core and injected to catalyst. This `VariableManager` is basically a wrapper around `SqlScriptingExecutionContext` and provides methods for getting/setting/creating variables. This injection is tricky because we want to have one `ScriptingVariableManager` **per script**. Options considered to achieve this are: - Pass the manager/context to the analyzer using function calls. If possible, this solution would be ideal because it would allow every run of the analyzer to have it's own scripting context which is automatically cleaned up (AnalysisContext). This would also allow more control over the variable resolution, i.e. for `EXECUTE IMMEDIATE` we could simply not pass in the script context and it would behave as if outside of a script. This is the intended behavior for `EXECUTE IMMEDIATE`. The problem with this approach is it seems hard to implement. The call stack would be as follows: `Analyzer.executeAndCheck` -> `HybridAnalyzer.apply` -> `RuleExecutor.executeAndTrack` -> `Analyzer.execute` (**overridden** from RuleExecutor) -> `Analyzer.withNewAnalysisContext`. Implementing this context propagation would require changing the signatures of all of these methods, including superclass methods like `execute` and `executeAndTrack`. - Store the context in `CatalogManager`. `CatalogManager's` lifetime is tied to the session, so to allow for multiple scripts to execute in the same time we would need to e.g. have a map `scriptUUID -> VariableManager`, and to have the `scriptUUID` as a `ThreadLocal` variable in the `CatalogManager`. The drawback of this approach is that the script has to clean up it's resources after execution, and also that it's more complicated to e.g. forbid `EXECUTE IMMEDIATE` from accessing local variables. Currently the second option seems better to me, however I am open to suggestions on how to approach this. EDIT: An option similar to the second one was chosen, except a ThreadLocal Singleton instance of context is used instead of storing it in `CatalogManager`. EDIT: Execute Immediate needs to be reworked in order to work properly with local variables. The generated query should not be able to access local variables, which means EXECUTE IMMEDIATE needs to somehow sandbox that query. This is done by analyzing it's entire subtree in SubstituteExecuteImmediate, with context so we know we are in EXECUTE IMMEDIATE. PR for this refactor - #49993 ### Why are the changes needed? Currently, local variables are simulated using session variables in SQL scripting, which is a temporary solution and bad in many ways. ### Does this PR introduce _any_ user-facing change? Yes, this change introduces multiple new types of errors. ### How was this patch tested? Tests were added to SqlScriptingExecutionSuite and SqlScriptingParserSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49445 from dusantism-db/scripting-local-variables. Authored-by: Dušan Tišma <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit fb17856) Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information
1 parent
60e1d4a
commit e140dbb
Showing
30 changed files
with
1,788 additions
and
327 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
66 changes: 66 additions & 0 deletions
66
common/utils/src/main/scala/org/apache/spark/util/LexicalThreadLocal.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.util | ||
|
||
/** | ||
* Helper trait for defining thread locals with lexical scoping. With this helper, the thread local | ||
* is private and can only be set by the [[Handle]]. The [[Handle]] only exposes the thread local | ||
* value to functions passed into its runWith method. This pattern allows for | ||
* the lifetime of the thread local value to be strictly controlled. | ||
* | ||
* Rather than calling `tl.set(...)` and `tl.remove()` you would get a handle and execute your code | ||
* in `handle.runWith { ... }`. | ||
* | ||
* Example: | ||
* {{{ | ||
* object Credentials extends LexicalThreadLocal[Int] { | ||
* def create(creds: Map[String, String]) = new Handle(Some(creds)) | ||
* } | ||
* ... | ||
* val handle = Credentials.create(Map("key" -> "value")) | ||
* assert(Credentials.get() == None) | ||
* handle.runWith { | ||
* assert(Credentials.get() == Some(Map("key" -> "value"))) | ||
* } | ||
* }}} | ||
*/ | ||
trait LexicalThreadLocal[T] { | ||
private val tl = new ThreadLocal[T] | ||
|
||
private def set(opt: Option[T]): Unit = { | ||
opt match { | ||
case Some(x) => tl.set(x) | ||
case None => tl.remove() | ||
} | ||
} | ||
|
||
protected def createHandle(opt: Option[T]): Handle = new Handle(opt) | ||
|
||
def get(): Option[T] = Option(tl.get) | ||
|
||
/** Final class representing a handle to a thread local value. */ | ||
final class Handle private[LexicalThreadLocal] (private val opt: Option[T]) { | ||
def runWith[R](f: => R): R = { | ||
val old = get() | ||
set(opt) | ||
try f finally { | ||
set(old) | ||
} | ||
} | ||
} | ||
} |
25 changes: 25 additions & 0 deletions
25
...alyst/src/main/scala/org/apache/spark/sql/catalyst/SqlScriptingLocalVariableManager.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql.catalyst | ||
|
||
import org.apache.spark.sql.catalyst.catalog.VariableManager | ||
import org.apache.spark.util.LexicalThreadLocal | ||
|
||
object SqlScriptingLocalVariableManager extends LexicalThreadLocal[VariableManager] { | ||
def create(variableManager: VariableManager): Handle = createHandle(Option(variableManager)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.