Support deploying syndicated views to stage #8532

scholtzan · 2025-12-01T23:56:25Z

Description

If any artifacts reference a syndicated views then stage deploys are currently failing with:

Unrecognized name: submission_timestamp

Stage deploys try to determine the schemas of referenced tables via dryruns if they don't have a locally defined schema.
The syndicated views aren't partitioned on submission_timestamp (unlike stable tables), so we need to make sure that the partitioning on submission_timestamp is only used when the dependency is a stable table.

Related Tickets & Documents

Reviewer, please follow this checklist

whd

As a general comment https://mozilla-hub.atlassian.net/browse/SVCSE-2995 exists to move syndicated view management entirely to DE-managed infra, but AFAIK isn't a priority at this time. I believe there was also some discussion of replacing the bespoke syndication mechanism with e.g. https://docs.cloud.google.com/bigquery/docs/analytics-hub-introduction#linked_datasets as well but is similarly not a short term priority

sean-rose · 2025-12-02T01:02:34Z

Could we instead avoid trying to deploy syndicated views to stage at all (including not rewriting references to syndicated views)?

scholtzan · 2025-12-02T17:40:29Z

Could we instead avoid trying to deploy syndicated views to stage at all (including not rewriting references to syndicated views)?

Yeah. I made the change and tested it on sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report which references a syndicated view. https://app.circleci.com/pipelines/github/mozilla/bigquery-etl/55684/workflows/ebf8426f-5fec-4dff-adb1-330223e6e719/jobs/713840

sean-rose

Thanks for being willing to change the approach! Though given that change, the PR title should probably be updated.

bigquery_etl/cli/stage.py

sean-rose · 2025-12-02T19:57:17Z

bigquery_etl/cli/stage.py

+                        pass
+
+                # Only create stubs if not syndicated OR if a local file already exists
+                if not is_syndicated or file_exists_for_dependency:


question (blocking): Are you certain this logic is correct? The only thing that comes to mind is maybe you were trying to account for non-syndicate content in syndicate datasets, but even in that scenario it doesn't make sense to me that the logic in this block should run when files already exist for the dependency.

One additional complication is it looks like the logic that sets file_exists_for_dependency assumes the dependency's project is the same as the referring view's project, which seems like a bug (none of the other logic in this method makes that assumption).

And if files do already exist for tables referenced by the view then it seems like those should be added to the set of view dependencies, but that doesn't seem like it's being done in either the existing logic or this updated logic.

I updated the logic. My intention was to handle cases where datasets have syndication configured but also have other content/tables.

bigquery_etl/cli/stage.py

sean-rose · 2025-12-02T20:25:01Z

Yeah. I made the change and tested it on sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report which references a syndicated view. https://app.circleci.com/pipelines/github/mozilla/bigquery-etl/55684/workflows/ebf8426f-5fec-4dff-adb1-330223e6e719/jobs/713840

It doesn't seem like that CI job actually published the app_store.firefox_app_store_territory_source_type_report view to stage though?

bqetl_project.yaml

scholtzan · 2025-12-02T23:00:14Z

Ah, the service account used to deploy artifacts to the staging project doesn't have access to shared-prod. I believe we decided not to grant read access as there is a risk that production data might get leaked.
I think since this is the case I might go back to my initial proposed solution of getting the schemas for the syndicate datasets and deploying those: 0ebe625

dataops-ci-bot · 2025-12-02T23:16:07Z

Integration report for "get schemas for tables that aren't in the repo"

`sql.diff`

Click to expand!

diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql	2025-12-02 23:10:48.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql	2025-12-02 23:04:07.000000000 +0000
@@ -8,6 +8,7 @@
   -- However, the `date` timestamp field appear to always show midnight meaning if we do timezone conversion
   -- we will end up moving all results 1 day back if we attempt conversion to UTC.
   -- This is why we are not doing timezone converstions here.
+  --
   *,
 FROM
   `moz-fx-data-shared-prod.app_store_syndicate.app_store_territory_source_type_report`

Link to full diff

sean-rose

r+wc

bigquery_etl/cli/stage.py

sean-rose · 2025-12-02T23:57:41Z

bigquery_etl/cli/stage.py

-                        partitioned_by = "submission_timestamp"
+                        partitioned_by = None
+
+                        if dataset.endswith("_stable"):


suggestion (blocking): As some views reference live tables, I'd think we'd also need to account for those:

Suggested change

if dataset.endswith("_stable"):

if any(dataset.endswith(suffix) for suffix in ("_live", "_stable")):

bigquery_etl/cli/stage.py

scholtzan requested a review from a team as a code owner December 1, 2025 23:56

whd reviewed Dec 2, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

scholtzan force-pushed the stage-deploys-syndicated-viws branch from e4b62c6 to adfd0ae Compare December 2, 2025 17:38

scholtzan requested a review from sean-rose December 2, 2025 17:40

sean-rose reviewed Dec 2, 2025

View reviewed changes

scholtzan added 4 commits December 2, 2025 14:30

Support for deploying syndicated views to stage

0ebe625

Don't publish or replace references of syndicate views in stage deploys

7ae15c7

Address review feedback

e25d639

test app store view deploys

62dd5e6

scholtzan force-pushed the stage-deploys-syndicated-viws branch from 5ec10f0 to 62dd5e6 Compare December 2, 2025 22:30

scholtzan commented Dec 2, 2025

View reviewed changes

bqetl_project.yaml Show resolved Hide resolved

scholtzan changed the title ~~Support for deploying syndicated views to stage~~ Skip deploying syndicated, stable, live tables during stage deploys Dec 2, 2025

This comment has been minimized.

Sign in to view

get schemas for tables that aren't in the repo

c1dd11f

undo app store test

f05d0aa

scholtzan changed the title ~~Skip deploying syndicated, stable, live tables during stage deploys~~ Support deploying syndicated views to stage Dec 2, 2025

sean-rose approved these changes Dec 3, 2025

View reviewed changes

Address review feedback

7b8e7b1

scholtzan added this pull request to the merge queue Dec 3, 2025

Merged via the queue into main with commit a2e7397 Dec 3, 2025
22 checks passed

scholtzan deleted the stage-deploys-syndicated-viws branch December 3, 2025 18:19

	if dataset.endswith("_stable"):
	if any(dataset.endswith(suffix) for suffix in ("_live", "_stable")):

Support deploying syndicated views to stage #8532

Support deploying syndicated views to stage #8532

Uh oh!

Conversation

scholtzan commented Dec 1, 2025

Description

Related Tickets & Documents

Uh oh!

whd left a comment

Choose a reason for hiding this comment

Uh oh!

sean-rose commented Dec 2, 2025

Uh oh!

This comment has been minimized.

scholtzan commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sean-rose left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sean-rose Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

sean-rose Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

scholtzan Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sean-rose commented Dec 2, 2025

Uh oh!

Uh oh!

This comment has been minimized.

scholtzan commented Dec 2, 2025

Uh oh!

dataops-ci-bot commented Dec 2, 2025

Integration report for "get schemas for tables that aren't in the repo"

sql.diff

Uh oh!

sean-rose left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sean-rose Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

scholtzan commented Dec 2, 2025 •

edited

Loading

`sql.diff`