Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48530][SQL] Support for local variables in SQL Scripting #49445

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
73cb01b
first commit
dusantism-db Dec 24, 2024
813d282
POC works
dusantism-db Dec 24, 2024
1c08f57
make column res helper more functional
dusantism-db Dec 25, 2024
18da02f
move variables map to SqlScriptingScope
dusantism-db Dec 25, 2024
cee5f1a
implement proper namespace (scope label name) for local variables
dusantism-db Dec 27, 2024
47934ab
qualified names
dusantism-db Dec 30, 2024
399d4e8
update todos
dusantism-db Jan 3, 2025
6efe764
resolve catalogs + check for duplicates
dusantism-db Jan 3, 2025
769607d
set variable and normalized identifiers
dusantism-db Jan 3, 2025
6225956
resolve fully qualified session vars in tempvarManager only and updat…
dusantism-db Jan 6, 2025
241fc05
tests first batch
dusantism-db Jan 6, 2025
068e1ec
add more tests
dusantism-db Jan 8, 2025
60335db
add error messages, more tests and some comments
dusantism-db Jan 8, 2025
65b69d3
rename TempVariableManager.scala and add more tests
dusantism-db Jan 8, 2025
fe5dc7b
remove old logic for dropping variables, update tests and add more tests
dusantism-db Jan 8, 2025
4f8d2c1
add cleanup for scripting execution, separate drop and create variabl…
dusantism-db Jan 9, 2025
ba5b8d2
fix resolvecatalogs and add more tests
dusantism-db Jan 9, 2025
33f0aac
refactor to support properly setting variables
dusantism-db Jan 9, 2025
be6052f
add error message for system and session label names
dusantism-db Jan 9, 2025
4b1e8e1
small fixes and cleanup
dusantism-db Jan 10, 2025
90b106b
Fix duplicate detection for set variablwe
dusantism-db Jan 10, 2025
7ba0923
Add test for DECLARE OR REPLACE but ignore it until FOR is fixed
dusantism-db Jan 10, 2025
cd4e932
execute immediate don't resolve vars from scripts. Problem remains wi…
dusantism-db Jan 10, 2025
fdf3c5a
cleanup
dusantism-db Jan 10, 2025
52cbd17
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Jan 13, 2025
c134fd4
fix merge mistake
dusantism-db Jan 13, 2025
3ea762d
fix merge mistake 2
dusantism-db Jan 13, 2025
8e9352a
fix comments
dusantism-db Jan 15, 2025
78042e3
Update CreateVar, SetVar and lookupVariable to work with Execute Imme…
dusantism-db Jan 16, 2025
40ffa83
add enum for lookup variable mode
dusantism-db Jan 17, 2025
4a546a4
convert scripting variable manager to threadlocal
dusantism-db Jan 17, 2025
15d5554
fix e2e test
dusantism-db Jan 17, 2025
a2b20c5
add comment
dusantism-db Jan 17, 2025
e3077a4
add comment and regenerate golden files
dusantism-db Jan 21, 2025
6ce8f9c
fix failing test
dusantism-db Jan 21, 2025
ccab52c
refactor SqlScriptingVariableManager to be LexicalThreadLocal singlet…
dusantism-db Jan 22, 2025
370bf65
renames
dusantism-db Jan 23, 2025
0cea838
tagging approach
dusantism-db Jan 24, 2025
9895c69
Revert "tagging approach"
dusantism-db Jan 24, 2025
cd888dd
analysiscontext withExecuteImmediate
dusantism-db Jan 24, 2025
4fe7ab5
remove into clause flag
dusantism-db Jan 25, 2025
8a6b536
address comments
dusantism-db Jan 27, 2025
db573c1
remove parameter from lookupVariable
dusantism-db Jan 27, 2025
dadd517
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Jan 27, 2025
680e5d7
resolve comments 1
dusantism-db Feb 6, 2025
7d3008e
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Feb 6, 2025
901aa6c
improve logic to work with exception handlers
dusantism-db Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions common/utils/src/main/resources/error/error-conditions.json
Original file line number Diff line number Diff line change
Expand Up @@ -3592,6 +3592,16 @@
"message" : [
"Variable <varName> can only be declared at the beginning of the compound."
]
},
"QUALIFIED_LOCAL_VARIABLE" : {
"message" : [
"The variable <varName> must be declared without a qualifier, as qualifiers are not allowed for local variable declarations."
]
},
"REPLACE_LOCAL_VARIABLE" : {
"message" : [
"The variable <varName> does not support DECLARE OR REPLACE, as local variables cannot be replaced."
]
}
},
"sqlState" : "42K0M"
Expand Down Expand Up @@ -3738,6 +3748,12 @@
],
"sqlState" : "42K0L"
},
"LABEL_NAME_FORBIDDEN" : {
"message" : [
"The label name <label> is forbidden."
],
"sqlState" : "42K0L"
},
"LOAD_DATA_PATH_NOT_EXISTS" : {
"message" : [
"LOAD DATA input path does not exist: <path>."
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.util

/**
* Helper trait for defining thread locals with lexical scoping. With this helper, the thread local
* is private and can only be set by the [[Handle]]. The [[Handle]] only exposes the thread local
* value to functions passed into its [[runWith]] method. This pattern allows for the lifetime of
* the thread local value to be strictly controlled.
*
* Rather than calling `tl.set(...)` and `tl.remove()` you would get a handle and execute your code
* in `handle.runWith { ... }`.
*
* Example:
* {{{
* object Credentials extends LexicalThreadLocal[Int] {
* def create(creds: Map[String, String]) = new Handle(Some(creds))
* }
* ...
* val handle = Credentials.create(Map("key" -> "value"))
* assert(Credentials.get() == None)
* handle.runWith {
* assert(Credentials.get() == Some(Map("key" -> "value")))
* }
* }}}
*/
trait LexicalThreadLocal[T] {
private val tl = new ThreadLocal[T]

private def set(opt: Option[T]): Unit = {
opt match {
case Some(x) => tl.set(x)
case None => tl.remove()
}
}

protected def createHandle(opt: Option[T]): Handle = new Handle(opt)

def get(): Option[T] = Option(tl.get)

/** Final class representing a handle to a thread local value. */
final class Handle private[LexicalThreadLocal] (private val opt: Option[T]) {
def runWith[R](f: => R): R = {
val old = get()
set(opt)
try f finally {
set(old)
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst

import org.apache.spark.sql.catalyst.catalog.VariableManager
import org.apache.spark.util.LexicalThreadLocal

object SqlScriptingLocalVariableManager extends LexicalThreadLocal[VariableManager] {
def create(variableManager: VariableManager): Handle = createHandle(Option(variableManager))
}
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ object FakeV2SessionCatalog extends TableCatalog with FunctionCatalog with Suppo
* even if a temp view `t` has been created.
* @param outerPlan The query plan from the outer query that can be used to resolve star
* expressions in a subquery.
* @param isExecuteImmediate Whether the current plan is created by EXECUTE IMMEDIATE. Used when
* resolving variables, as SQL Scripting local variables should not be
* visible from EXECUTE IMMEDIATE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this EXECUTE IMMEDIATE hack. Can you explain it in detail with examples?

Copy link
Contributor Author

@dusantism-db dusantism-db Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXECUTE IMMEDIATE has 3 parts - SQL string, INTO clause and USING clause. INTO (set variables) and USING (capture variables) should be able to access local variables, however the query generated by the SQL string should not. It should be run as if it's not in a script.

We add isExecuteImmediate to AnalysisContext to know if we are in a plan generated by EXECUTE IMMEDIATE. If we are, we cannot access local variables.

USING clause is resolved before the SQL string, at this point isExecuteImmeidate is not set and we normally access local variables.

INTO clause is not resolved before SQL string, so we have to make an exception for it. Since it internally uses SetVariable, we add a flag isExecuteImmediateIntoClause to SetVariable, which allows access to local variables if set to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples:

Should work:

BEGIN
  DECLARE testVar = 5;
  EXECUTE IMMEDIATE 'SELECT ?' USING testVar;
END

Should work:

BEGIN
  DECLARE localVar = 1;
  EXECUTE IMMEDIATE 'SELECT 5' INTO localVar;
END

Should not work:

BEGIN
  DECLARE localVar = 5;
  EXECUTE IMMEDIATE 'SELECT localVar';
END

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Instead of hacking the AnalysisContext, how about we move the hack into the scripting itself? e.g. in SingleStatementExec#buildDataFrame, if the parsed plan is ExecuteImmediateQuery, we do

SqlScriptingVariableManager.create(None).runWith(Dataset.ofRows(session, preparedPlan))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm, the INTO clause can access local variables.

Copy link
Contributor Author

@dusantism-db dusantism-db Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm, the INTO clause can access local variables.

Yeah, that's why I did it this way, we need a way to make an exception for INTO clause.

Copy link
Contributor

@cloud-fan cloud-fan Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisContext is a singleton instance and it's very hacky and fragile to update a global bool flag to implement this fix. I think the key problem here is still passing around state, and I have a new idea:

In SubstituteExecuteImmediate, after we parse the query body, we tag all UnresolvedAttributes within the query to indicate that they are inside EXECUTE IMMEDIATE and they should not be resolved to local variables. Something like this

val EXEC_IMMEDIATE_TAG = new TreeNodeTag[Unit]("inside_execute_immediate")

def tagVariables(plan: LogicalPlan): Unit = {
  plan.expressions.foreach(_.foreach {
    case u: UnresolvedAttribute =>
      u.setTagValue(PARAM_QUERY_TAG, ())
  })
  plan.subqueries.foreach(tagVariables)
  plan.children.foreach(tagVariables)
}

val executeImmediateQuery = ...
tagVariables(executeImmediateQuery)

In the places that match UnresolvedAttribute and try to look up variables, we skip local variables if the tag is present.

In the new single-pass analyzer, we can have a better way to pass around states: when we top-down traverse the plan tree and see ExecuteImmediateQuery, we set a flag in the scope to indicate it's under ExecuteImmediateQuery and keep traversing.

cc @vladimirg-db

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags are fragile as well, because you lose them after .copy, but I guess it's better than AnalysisContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes tags are fragile, but it's less an issue with UnresolvedAttribute as I can't think of any rule that copies UnresolvedAttribute.

Anyway, I don't have a better idea now unless we completely move to the new single-pass analyzer in the future.

*/
case class AnalysisContext(
catalogAndNamespace: Seq[String] = Nil,
Expand All @@ -154,6 +157,7 @@ case class AnalysisContext(
referredTempFunctionNames: mutable.Set[String] = mutable.Set.empty,
referredTempVariableNames: Seq[Seq[String]] = Seq.empty,
outerPlan: Option[LogicalPlan] = None,
isExecuteImmediate: Boolean = false,

/**
* This is a bridge state between this fixed-point [[Analyzer]] and a single-pass [[Resolver]].
Expand Down Expand Up @@ -208,7 +212,16 @@ object AnalysisContext {
originContext.relationCache,
viewDesc.viewReferredTempViewNames,
mutable.Set(viewDesc.viewReferredTempFunctionNames: _*),
viewDesc.viewReferredTempVariableNames)
viewDesc.viewReferredTempVariableNames,
isExecuteImmediate = originContext.isExecuteImmediate)
set(context)
try f finally { set(originContext) }
}

def withExecuteImmediateContext[A](f: => A): A = {
val originContext = value.get()
val context = originContext.copy(isExecuteImmediate = true)

set(context)
try f finally { set(originContext) }
}
Expand Down Expand Up @@ -325,7 +338,10 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor

override def batches: Seq[Batch] = Seq(
Batch("Substitution", fixedPoint,
new SubstituteExecuteImmediate(catalogManager),
new SubstituteExecuteImmediate(
catalogManager,
resolveChild = executeSameContext,
checkAnalysis = checkAnalysis),
// This rule optimizes `UpdateFields` expression chains so looks more like optimization rule.
// However, when manipulating deeply nested schema, `UpdateFields` expression tree could be
// very complex and make analysis impossible. Thus we need to optimize `UpdateFields` early
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import scala.collection.mutable

import org.apache.spark.internal.Logging
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.SqlScriptingLocalVariableManager
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.SubExprUtils.wrapOuterReference
import org.apache.spark.sql.catalyst.plans.logical._
Expand Down Expand Up @@ -251,6 +252,14 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
}
}

/**
* Look up variable by nameParts.
* If in SQL Script, first check local variables, unless in EXECUTE IMMEDIATE
* (EXECUTE IMMEDIATE generated query cannot access local variables).
* if not found fall back to session variables.
* @param nameParts NameParts of the variable.
* @return Reference to the variable.
*/
def lookupVariable(nameParts: Seq[String]): Option[VariableReference] = {
// The temp variables live in `SYSTEM.SESSION`, and the name can be qualified or not.
def maybeTempVariableName(nameParts: Seq[String]): Boolean = {
Expand All @@ -266,22 +275,48 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
}
}

if (maybeTempVariableName(nameParts)) {
val variableName = if (conf.caseSensitiveAnalysis) {
nameParts.last
} else {
nameParts.last.toLowerCase(Locale.ROOT)
}
catalogManager.tempVariableManager.get(variableName).map { varDef =>
val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) {
nameParts
} else {
nameParts.map(_.toLowerCase(Locale.ROOT))
}

SqlScriptingLocalVariableManager.get()
// If sessionOnly is set to true lookup only session variables.
.filterNot(_ => AnalysisContext.get.isExecuteImmediate)
// If variable name is qualified with system.session.<varName> treat it as a session variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: session can't be a label name so we can directly look up local variables if nameParts.length <= 2 and then fallback to session variable lookup.

Copy link
Contributor Author

@dusantism-db dusantism-db Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to leave the explicit checks here, because it's more performant this way as local variable lookup will iterate through all frames and scopes. There's no reason to do that if we have session or system.session. Also if we have it explicitly it will be safer if we make changes in the future.

Copy link
Contributor

@cloud-fan cloud-fan Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about this

...
.flatMap { localVarManager =>
  if (nameParts.length <= 2 && nameParts.init != Seq("session"))) {
    localVarManager.get(...)
  } else {
    None
  }
}.orElse ...

Copy link
Contributor

@cloud-fan cloud-fan Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local variable can only be qualified by one label, right? see https://github.com/apache/spark/pull/49445/files#r1913307900

.filterNot(_ =>
nameParts.length == 3
&& nameParts.take(2).map(_.toLowerCase(Locale.ROOT)) == Seq("system", "session"))
// If variable name is qualified with session.<varName> treat it as a session variable.
.filterNot(_ =>
nameParts.length == 2
&& nameParts.head.toLowerCase(Locale.ROOT) == "session")
// Local variable must be in format <varName> or <label>.<varName>
.filter(_ => namePartsCaseAdjusted.nonEmpty && namePartsCaseAdjusted.length <= 2)
dusantism-db marked this conversation as resolved.
Show resolved Hide resolved
.flatMap(_.get(namePartsCaseAdjusted))
.map { varDef =>
VariableReference(
nameParts,
FakeSystemCatalog,
Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), variableName),
Identifier.of(Array(varDef.identifier.namespace().last), namePartsCaseAdjusted.last),
varDef)
}
} else {
None
}
.orElse(
if (maybeTempVariableName(nameParts)) {
catalogManager.tempVariableManager
.get(namePartsCaseAdjusted)
.map { varDef =>
VariableReference(
nameParts,
FakeSystemCatalog,
Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), namePartsCaseAdjusted.last),
varDef
)}
} else {
None
}
)
}

// Resolves `UnresolvedAttribute` to its value.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ package org.apache.spark.sql.catalyst.analysis

import scala.jdk.CollectionConverters._

import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.SqlScriptingLocalVariableManager
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, Identifier, LookupCatalog, SupportsNamespaces}
import org.apache.spark.sql.errors.DataTypeErrors.toSQLId
import org.apache.spark.sql.errors.QueryCompilationErrors
import org.apache.spark.util.ArrayImplicits._

Expand All @@ -34,11 +37,30 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown {
// We only support temp variables for now and the system catalog is not properly implemented
// yet. We need to resolve `UnresolvedIdentifier` for variable commands specially.
case c @ CreateVariable(UnresolvedIdentifier(nameParts, _), _, _) =>
val resolved = resolveVariableName(nameParts)
c.copy(name = resolved)
case c @ CreateVariable(UnresolvedIdentifier(nameParts, _), _, _, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Spark does here is to determine where to create the variable (catalog and namespace), and then turn UnresolvedIdentifier into qualified ResolvedIdentifier. I think we don't need an extra sessionVariablesOnly flag in CreateVariable, the qualified ResolvedIdentifier can determine everything.

  • If the variable name is already qualified (session.var or system.session.var), always fully qualify it to system.session.var or fail if the qualifier is not system.session. This is because users can create session variables explicitly (via qualified names) anywhere.
  • If the variable name is unqualified: If we are not in script or we are inside EXECUTE IMMEDIATE, qualify it to system.session.var. Otherwise, qualify it to local.current_scope_label_name.var

We can create a FakeLocalCatalog following FakeSystemCatalog. In CreateVariableExec, if the catalog is FakeLocalCatalog, the script local variable manager must be present and we create local variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the same idea for DropVariable and SetVariable

// From scripts we can only create local variables, which must be unqualified,
// and must not be DECLARE OR REPLACE.
if (SqlScriptingLocalVariableManager.get().isDefined &&
!AnalysisContext.get.isExecuteImmediate) {
// TODO [SPARK-50785]: Uncomment this when For Statement starts properly using local vars.
// if (c.replace) {
// throw new AnalysisException(
// "INVALID_VARIABLE_DECLARATION.REPLACE_LOCAL_VARIABLE",
// Map("varName" -> toSQLId(nameParts))
// )
// }

if (nameParts.length != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this CreateVariable is inside EXECUTE IMMEDIATE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, added check we're not in EXECUTE IMMEDIATE if throwing this error.

throw new AnalysisException(
"INVALID_VARIABLE_DECLARATION.QUALIFIED_LOCAL_VARIABLE",
Map("varName" -> toSQLId(nameParts)))
}
}

val resolved = resolveCreateVariableName(nameParts)
c.copy(name = resolved, sessionVariablesOnly = AnalysisContext.get.isExecuteImmediate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolvedIdentifier needs a catalog and it's system catalog for creating session variables. What catalog should we put for creating local variables? also the system catalog?

case d @ DropVariable(UnresolvedIdentifier(nameParts, _), _) =>
val resolved = resolveVariableName(nameParts)
val resolved = resolveDropVariableName(nameParts)
d.copy(name = resolved)

case UnresolvedIdentifier(nameParts, allowTemp) =>
Expand Down Expand Up @@ -73,28 +95,40 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
}
}

private def resolveVariableName(nameParts: Seq[String]): ResolvedIdentifier = {
def ident: Identifier = Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), nameParts.last)
if (nameParts.length == 1) {
private def resolveCreateVariableName(nameParts: Seq[String]): ResolvedIdentifier = {
val ident = SqlScriptingLocalVariableManager.get()
.filterNot(_ => AnalysisContext.get.isExecuteImmediate)
.getOrElse(catalogManager.tempVariableManager)
.createIdentifier(nameParts.last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Identifier is already created, we can directly return ResolvedIdentifier(FakeSystemCatalog, ident) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean already created? Here we create the identifier, which is dependent on scripting context in the case of local variables, and then we return it in ResolvedIdentifier(FakeSystemCatalog, ident) after checking for errors.


resolveVariableName(nameParts, ident)
}

private def resolveDropVariableName(nameParts: Seq[String]): ResolvedIdentifier = {
// Only session variables can be dropped, so catalogManager.scriptingLocalVariableManager
// is not checked in the case of DropVariable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so even if there is name conflict between session and local variables, DROP VARIABLE always drop session variable?

Copy link
Contributor Author

@dusantism-db dusantism-db Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DROP will never consider local variables. It only works on session variables, per spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @srielau can you chime in here?

My current understanding is that we don't want to allow drop of local variables - and that makes total sense. However, the current agreement/implementation completely ignores local variables when resolving variable references in the DROP statement, which I don't think is right - simply because of the inconsistency in variable resolution logic between different types of statements. Simple examples:

input:

DECLARE x INT = 1;
BEGIN
  DECLARE x INT = 2;
  SET x = 3;
  SELECT x;
END

output: 3

input:

DECLARE x INT = 1;
BEGIN
  DECLARE x INT = 2;
  DROP TEMPORARY VARIABLE x;
END

result: session var will be dropped (x = 1)

Here, we can see that the meaning of x in the script context isn't the same in different statements, which kinda doesn't look good/right in my opinion.
I agree that DROP TEMPORARY VARIABLE indicates a bit that it is related to session vars, but also, the local vars are more temporary than the session ones, so even more strange.

Anyways, I think that the second example should throw an exception stating that the local variables cannot be dropped (because x resolves to a local variable). Exception messaging can be improved/extended in cases when there is a session var with the same name, to annotate that if the intent is to drop session variable, you need to use qualified name (system.session.x).
If there's no local variable defined with the same name, then everything is fine, session variable would get dropped because that's what x would resolve to anyways.

Simple example why this might be important - customer may want to drop local var (not aware that it's not allowed, or by mistake) and instead of getting an exception, the session variable would be silently dropped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to fail explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srielau Could you provide your input here?

val ident = catalogManager.tempVariableManager.createIdentifier(nameParts.last)
resolveVariableName(nameParts, ident)
}

private def resolveVariableName(
nameParts: Seq[String],
ident: Identifier): ResolvedIdentifier = nameParts.length match {
case 1 => ResolvedIdentifier(FakeSystemCatalog, ident)

// On declare variable, local variables support only unqualified names.
// On drop variable, local variables are not supported at all.
case 2 if nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE) =>
ResolvedIdentifier(FakeSystemCatalog, ident)
} else if (nameParts.length == 2) {
if (nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) {
ResolvedIdentifier(FakeSystemCatalog, ident)
} else {
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
}
} else if (nameParts.length == 3) {
if (nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) &&
nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) {
ResolvedIdentifier(FakeSystemCatalog, ident)
} else {
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
}
} else {

// When there are 3 nameParts the variable must be a fully qualified session variable
// i.e. "system.session.<varName>"
case 3 if nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) &&
nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE) =>
ResolvedIdentifier(FakeSystemCatalog, ident)

case _ =>
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
}
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, ident.namespace().head))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check if it makes sense to throw error like this (with SYSTEM_CATALOG_NAME) in the case of local vars or should we create a similar error, but specific to scripting?
This might imply that you can access the local var in system.<label>.<varName> format which is not correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variables can only be qualified by a label. Label's themselves are non qualified.
parameters can be qualified by the procedure name. Procedure names can be qualified.

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,17 @@ class ResolveSetVariable(val catalogManager: CatalogManager) extends Rule[Logica
// Names are normalized when the variables are created.
// No need for case insensitive comparison here.
// TODO: we need to group by the qualified variable name once other catalogs support it.
val dups = resolvedVars.groupBy(_.identifier.name).filter(kv => kv._2.length > 1)
val dups = resolvedVars.groupBy(_.identifier).filter(kv => kv._2.length > 1)
if (dups.nonEmpty) {
throw new AnalysisException(
errorClass = "DUPLICATE_ASSIGNMENTS",
messageParameters = Map("nameList" -> dups.keys.map(toSQLId).mkString(", ")))
messageParameters = Map("nameList" ->
dups.keys.map(key => toSQLId(key.name())).mkString(", ")))
}

setVariable.copy(targetVariables = resolvedVars)
setVariable.copy(
targetVariables = resolvedVars,
sessionVariablesOnly = AnalysisContext.get.isExecuteImmediate)

case setVariable: SetVariable
if setVariable.targetVariables.forall(_.isInstanceOf[VariableReference]) &&
Expand Down
Loading