-
Notifications
You must be signed in to change notification settings - Fork 238
Fix!: mark vars referenced in metadata macros as metadata #4936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eccc76e
to
8d8fc06
Compare
b27b6ca
to
58c70db
Compare
a7d62e1
to
06c31f9
Compare
# they will be handled in a separate call of _extract_macro_func_variable_references. | ||
def _prune_nested_macro_func(expression: exp.Expression) -> bool: | ||
return ( | ||
type(n) is d.MacroFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't here these variables all be expression
instead of n
in _prune_nested_macro_func
?
k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v | ||
for k, v in blueprint_variables.items() | ||
if k in metadata_used_variables | ||
} | ||
blueprint_variables = { | ||
k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine in a subsequent pr we'll turn these keys to lower()
metadata_used_variables = set() | ||
for used_var, macro_names in (macro_funcs_by_used_var or {}).items(): | ||
if used_variable_referenced_in_metadata_expression.get(used_var) or all( | ||
name in python_env and python_env[name].is_metadata for name in macro_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is maybe this needed for is_metadata: name in python_env and getattr(python_env.get(name), 'is_metadata', False)
to handle none or all
handles nones just fine?
and bool(is_metadata) | ||
) | ||
else: | ||
for var_ref in _extract_macro_func_variable_references(macro_func_or_var): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the complexity because of this nested walk with _extract_macro_func_variable_references
inside the loop which uses find_all
is too much and can be reduced
) | ||
|
||
metadata_used_variables = set() | ||
for used_var, macro_names in (macro_funcs_by_used_var or {}).items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might need a small comment over it because it took a while to figure out what was happening, also a question because I'm still not entirely certain if I read this correctly: this used_variable_referenced_in_metadata_expression.get(used_var)
is False when it's non-metadata and it won't be added to metadata_used_variables
unless all macros using it are metadata-only?
Macro variable references are always treated as non-metadata today. This means that if, for example, a variable is referenced within a metadata-only macro, changing its value will result in a breaking change, which is inconsistent.
This PR alters this behavior, similar to the macro metadata-only status propagation:
I intentionally say "can" instead of "will" above, because we need to factor in all references of a variable to decide whether it's a metadata-only reference. The rules implemented here are similar to those we apply for macros: a non-metadata occurrence overrules all metadata occurrences.
Additionally, this PR introduces trimming for blueprint variables. Certain blueprint variables, e.g. used in model names, aren't required after loading, while others are because they may be referenced in the model's statements or in "runtime-rendered" properties (e.g.,
merge_filter
).The former category can be omitted from the model's
python_env
, thus reducing its snapshot's size, as long as a variable is only referenced in the meta block and in fields that are static after loading the model.Both of these changes are quite breaking, so I'm planning to implement a migration script to at least warn about this. I'm also planning to increase the testing coverage.