Skip to content

Conversation

@scholtzan
Copy link
Collaborator

Description

Fixes #8530

If any artifacts reference a syndicated views then stage deploys are currently failing with:

Unrecognized name: submission_timestamp

Stage deploys try to determine the schemas of referenced tables via dryruns if they don't have a locally defined schema.
The syndicated views aren't partitioned on submission_timestamp (unlike stable tables), so we need to make sure that the partitioning on submission_timestamp is only used when the dependency is a stable table.

Related Tickets & Documents

Reviewer, please follow this checklist

@scholtzan scholtzan requested a review from a team as a code owner December 1, 2025 23:56
Copy link
Member

@whd whd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment https://mozilla-hub.atlassian.net/browse/SVCSE-2995 exists to move syndicated view management entirely to DE-managed infra, but AFAIK isn't a priority at this time. I believe there was also some discussion of replacing the bespoke syndication mechanism with e.g. https://docs.cloud.google.com/bigquery/docs/analytics-hub-introduction#linked_datasets as well but is similarly not a short term priority

@sean-rose
Copy link
Contributor

Could we instead avoid trying to deploy syndicated views to stage at all (including not rewriting references to syndicated views)?

@dataops-ci-bot

This comment has been minimized.

@scholtzan scholtzan force-pushed the stage-deploys-syndicated-viws branch from e4b62c6 to adfd0ae Compare December 2, 2025 17:38
@scholtzan
Copy link
Collaborator Author

scholtzan commented Dec 2, 2025

Could we instead avoid trying to deploy syndicated views to stage at all (including not rewriting references to syndicated views)?

Yeah. I made the change and tested it on sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report which references a syndicated view. https://app.circleci.com/pipelines/github/mozilla/bigquery-etl/55684/workflows/ebf8426f-5fec-4dff-adb1-330223e6e719/jobs/713840

@scholtzan scholtzan requested a review from sean-rose December 2, 2025 17:40
Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being willing to change the approach! Though given that change, the PR title should probably be updated.

pass

# Only create stubs if not syndicated OR if a local file already exists
if not is_syndicated or file_exists_for_dependency:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Are you certain this logic is correct? The only thing that comes to mind is maybe you were trying to account for non-syndicate content in syndicate datasets, but even in that scenario it doesn't make sense to me that the logic in this block should run when files already exist for the dependency.

One additional complication is it looks like the logic that sets file_exists_for_dependency assumes the dependency's project is the same as the referring view's project, which seems like a bug (none of the other logic in this method makes that assumption).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if files do already exist for tables referenced by the view then it seems like those should be added to the set of view dependencies, but that doesn't seem like it's being done in either the existing logic or this updated logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic. My intention was to handle cases where datasets have syndication configured but also have other content/tables.

@sean-rose
Copy link
Contributor

Yeah. I made the change and tested it on sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report which references a syndicated view. https://app.circleci.com/pipelines/github/mozilla/bigquery-etl/55684/workflows/ebf8426f-5fec-4dff-adb1-330223e6e719/jobs/713840

It doesn't seem like that CI job actually published the app_store.firefox_app_store_territory_source_type_report view to stage though?

@scholtzan scholtzan force-pushed the stage-deploys-syndicated-viws branch from 5ec10f0 to 62dd5e6 Compare December 2, 2025 22:30
@scholtzan scholtzan changed the title Support for deploying syndicated views to stage Skip deploying syndicated, stable, live tables during stage deploys Dec 2, 2025
@dataops-ci-bot

This comment has been minimized.

@scholtzan
Copy link
Collaborator Author

Ah, the service account used to deploy artifacts to the staging project doesn't have access to shared-prod. I believe we decided not to grant read access as there is a risk that production data might get leaked.
I think since this is the case I might go back to my initial proposed solution of getting the schemas for the syndicate datasets and deploying those: 0ebe625

@dataops-ci-bot
Copy link

Integration report for "get schemas for tables that aren't in the repo"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql	2025-12-02 23:10:48.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql	2025-12-02 23:04:07.000000000 +0000
@@ -8,6 +8,7 @@
   -- However, the `date` timestamp field appear to always show midnight meaning if we do timezone conversion
   -- we will end up moving all results 1 day back if we attempt conversion to UTC.
   -- This is why we are not doing timezone converstions here.
+  --
   *,
 FROM
   `moz-fx-data-shared-prod.app_store_syndicate.app_store_territory_source_type_report`

Link to full diff

@scholtzan scholtzan changed the title Skip deploying syndicated, stable, live tables during stage deploys Support deploying syndicated views to stage Dec 2, 2025
Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc

partitioned_by = "submission_timestamp"
partitioned_by = None

if dataset.endswith("_stable"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (blocking): As some views reference live tables, I'd think we'd also need to account for those:

Suggested change
if dataset.endswith("_stable"):
if any(dataset.endswith(suffix) for suffix in ("_live", "_stable")):

@scholtzan scholtzan added this pull request to the merge queue Dec 3, 2025
Merged via the queue into main with commit a2e7397 Dec 3, 2025
22 checks passed
@scholtzan scholtzan deleted the stage-deploys-syndicated-viws branch December 3, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support stage deploys for syndicated views

5 participants