-
Notifications
You must be signed in to change notification settings - Fork 124
Support deploying syndicated views to stage #8532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
whd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment https://mozilla-hub.atlassian.net/browse/SVCSE-2995 exists to move syndicated view management entirely to DE-managed infra, but AFAIK isn't a priority at this time. I believe there was also some discussion of replacing the bespoke syndication mechanism with e.g. https://docs.cloud.google.com/bigquery/docs/analytics-hub-introduction#linked_datasets as well but is similarly not a short term priority
|
Could we instead avoid trying to deploy syndicated views to stage at all (including not rewriting references to syndicated views)? |
This comment has been minimized.
This comment has been minimized.
e4b62c6 to
adfd0ae
Compare
Yeah. I made the change and tested it on |
sean-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for being willing to change the approach! Though given that change, the PR title should probably be updated.
bigquery_etl/cli/stage.py
Outdated
| pass | ||
|
|
||
| # Only create stubs if not syndicated OR if a local file already exists | ||
| if not is_syndicated or file_exists_for_dependency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (blocking): Are you certain this logic is correct? The only thing that comes to mind is maybe you were trying to account for non-syndicate content in syndicate datasets, but even in that scenario it doesn't make sense to me that the logic in this block should run when files already exist for the dependency.
One additional complication is it looks like the logic that sets file_exists_for_dependency assumes the dependency's project is the same as the referring view's project, which seems like a bug (none of the other logic in this method makes that assumption).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if files do already exist for tables referenced by the view then it seems like those should be added to the set of view dependencies, but that doesn't seem like it's being done in either the existing logic or this updated logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the logic. My intention was to handle cases where datasets have syndication configured but also have other content/tables.
It doesn't seem like that CI job actually published the |
5ec10f0 to
62dd5e6
Compare
This comment has been minimized.
This comment has been minimized.
|
Ah, the service account used to deploy artifacts to the staging project doesn't have access to shared-prod. I believe we decided not to grant read access as there is a risk that production data might get leaked. |
Integration report for "get schemas for tables that aren't in the repo"
|
sean-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+wc
bigquery_etl/cli/stage.py
Outdated
| partitioned_by = "submission_timestamp" | ||
| partitioned_by = None | ||
|
|
||
| if dataset.endswith("_stable"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (blocking): As some views reference live tables, I'd think we'd also need to account for those:
| if dataset.endswith("_stable"): | |
| if any(dataset.endswith(suffix) for suffix in ("_live", "_stable")): |
Description
Fixes #8530
If any artifacts reference a syndicated views then stage deploys are currently failing with:
Stage deploys try to determine the schemas of referenced tables via dryruns if they don't have a locally defined schema.
The syndicated views aren't partitioned on
submission_timestamp(unlike stable tables), so we need to make sure that the partitioning onsubmission_timestampis only used when the dependency is a stable table.Related Tickets & Documents
Reviewer, please follow this checklist