Skip to content

Fix!: mark vars referenced in metadata macros as metadata #4936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

georgesittas
Copy link
Contributor

Macro variable references are always treated as non-metadata today. This means that if, for example, a variable is referenced within a metadata-only macro, changing its value will result in a breaking change, which is inconsistent.

This PR alters this behavior, similar to the macro metadata-only status propagation:

  • Variables referenced within metadata-only macro definitions can be treated as metadata-obly
  • Variables referenced in metadata-only macro calls can be treated as metadata-only
  • Variables referenced within metadata properties can be treated as metadata-only

I intentionally say "can" instead of "will" above, because we need to factor in all references of a variable to decide whether it's a metadata-only reference. The rules implemented here are similar to those we apply for macros: a non-metadata occurrence overrules all metadata occurrences.

Additionally, this PR introduces trimming for blueprint variables. Certain blueprint variables, e.g. used in model names, aren't required after loading, while others are because they may be referenced in the model's statements or in "runtime-rendered" properties (e.g., merge_filter).

The former category can be omitted from the model's python_env, thus reducing its snapshot's size, as long as a variable is only referenced in the meta block and in fields that are static after loading the model.

Both of these changes are quite breaking, so I'm planning to implement a migration script to at least warn about this. I'm also planning to increase the testing coverage.

@georgesittas georgesittas marked this pull request as draft July 8, 2025 15:32
@georgesittas georgesittas force-pushed the jo/metadata_vars branch 4 times, most recently from b27b6ca to 58c70db Compare July 17, 2025 14:17
@georgesittas georgesittas marked this pull request as ready for review July 18, 2025 12:28
@georgesittas georgesittas requested a review from a team July 18, 2025 13:45
# they will be handled in a separate call of _extract_macro_func_variable_references.
def _prune_nested_macro_func(expression: exp.Expression) -> bool:
return (
type(n) is d.MacroFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't here these variables all be expression instead of n in _prune_nested_macro_func ?

k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v
for k, v in blueprint_variables.items()
if k in metadata_used_variables
}
blueprint_variables = {
k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine in a subsequent pr we'll turn these keys to lower()

metadata_used_variables = set()
for used_var, macro_names in (macro_funcs_by_used_var or {}).items():
if used_variable_referenced_in_metadata_expression.get(used_var) or all(
name in python_env and python_env[name].is_metadata for name in macro_names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is maybe this needed for is_metadata: name in python_env and getattr(python_env.get(name), 'is_metadata', False) to handle none or all handles nones just fine?

and bool(is_metadata)
)
else:
for var_ref in _extract_macro_func_variable_references(macro_func_or_var):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the complexity because of this nested walk with _extract_macro_func_variable_references inside the loop which uses find_all is too much and can be reduced

)

metadata_used_variables = set()
for used_var, macro_names in (macro_funcs_by_used_var or {}).items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might need a small comment over it because it took a while to figure out what was happening, also a question because I'm still not entirely certain if I read this correctly: this used_variable_referenced_in_metadata_expression.get(used_var) is False when it's non-metadata and it won't be added to metadata_used_variables unless all macros using it are metadata-only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants