Fix!: mark vars referenced in metadata macros as metadata #4936

georgesittas · 2025-07-08T15:32:40Z

Macro variable references are always treated as non-metadata today. This means that if, for example, a variable is referenced within a metadata-only macro, changing its value will result in a breaking change, which is inconsistent.

This PR alters this behavior, similar to the macro metadata-only status propagation:

Variables referenced within metadata-only macro definitions can be treated as metadata-obly
Variables referenced in metadata-only macro calls can be treated as metadata-only
Variables referenced within metadata properties can be treated as metadata-only

I intentionally say "can" instead of "will" above, because we need to factor in all references of a variable to decide whether it's a metadata-only reference. The rules implemented here are similar to those we apply for macros: a non-metadata occurrence overrules all metadata occurrences.

Additionally, this PR introduces trimming for blueprint variables. Certain blueprint variables, e.g. used in model names, aren't required after loading, while others are because they may be referenced in the model's statements or in "runtime-rendered" properties (e.g., merge_filter).

The former category can be omitted from the model's python_env, thus reducing its snapshot's size, as long as a variable is only referenced in the meta block and in fields that are static after loading the model.

Both of these changes are quite breaking, so I'm planning to implement a migration script to at least warn about this. I'm also planning to increase the testing coverage.

themisvaltinos · 2025-07-21T14:33:15Z

sqlmesh/core/model/common.py

+    # they will be handled in a separate call of _extract_macro_func_variable_references.
+    def _prune_nested_macro_func(expression: exp.Expression) -> bool:
+        return (
+            type(n) is d.MacroFunc


shouldn't here these variables all be expression instead of n in _prune_nested_macro_func ?

themisvaltinos · 2025-07-21T14:33:53Z

sqlmesh/core/model/common.py

+            k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v
+            for k, v in blueprint_variables.items()
+            if k in metadata_used_variables
+        }
        blueprint_variables = {
            k: SqlValue(sql=v.sql(dialect=dialect)) if isinstance(v, exp.Expression) else v


I imagine in a subsequent pr we'll turn these keys to lower()

themisvaltinos · 2025-07-21T14:39:39Z

sqlmesh/core/model/common.py

+    metadata_used_variables = set()
+    for used_var, macro_names in (macro_funcs_by_used_var or {}).items():
+        if used_variable_referenced_in_metadata_expression.get(used_var) or all(
+            name in python_env and python_env[name].is_metadata for name in macro_names


is maybe this needed for is_metadata: name in python_env and getattr(python_env.get(name), 'is_metadata', False) to handle none or all handles nones just fine?

themisvaltinos · 2025-07-21T14:51:11Z

sqlmesh/core/model/common.py

+                        and bool(is_metadata)
+                    )
+                else:
+                    for var_ref in _extract_macro_func_variable_references(macro_func_or_var):


I wonder if the complexity because of this nested walk with _extract_macro_func_variable_references inside the loop which uses find_all is too much and can be reduced

themisvaltinos · 2025-07-21T15:09:45Z

sqlmesh/core/model/common.py

+        )
+
+    metadata_used_variables = set()
+    for used_var, macro_names in (macro_funcs_by_used_var or {}).items():


this might need a small comment over it because it took a while to figure out what was happening, also a question because I'm still not entirely certain if I read this correctly: this used_variable_referenced_in_metadata_expression.get(used_var) is False when it's non-metadata and it won't be added to metadata_used_variables unless all macros using it are metadata-only?

georgesittas marked this pull request as draft July 8, 2025 15:32

georgesittas force-pushed the jo/metadata_vars branch from eccc76e to 8d8fc06 Compare July 8, 2025 18:30

georgesittas force-pushed the jo/metadata_vars branch 4 times, most recently from b27b6ca to 58c70db Compare July 17, 2025 14:17

georgesittas added 3 commits July 18, 2025 14:30

Fix!: mark vars referenced in metadata macros as metadata

c04d23c

Fix bugs

11400fb

Fix macro func variable extraction & add tests

06c31f9

georgesittas force-pushed the jo/metadata_vars branch from a7d62e1 to 06c31f9 Compare July 18, 2025 12:26

georgesittas marked this pull request as ready for review July 18, 2025 12:28

Add migration script to warn about diffs

658991f

georgesittas requested a review from a team July 18, 2025 13:45

themisvaltinos reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix!: mark vars referenced in metadata macros as metadata #4936

Fix!: mark vars referenced in metadata macros as metadata #4936

Uh oh!

georgesittas commented Jul 8, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Uh oh!

Uh oh!

Fix!: mark vars referenced in metadata macros as metadata #4936

Are you sure you want to change the base?

Fix!: mark vars referenced in metadata macros as metadata #4936

Uh oh!

Conversation

georgesittas commented Jul 8, 2025

Uh oh!

themisvaltinos Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

themisvaltinos Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

themisvaltinos Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

themisvaltinos Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

themisvaltinos Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!