-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(dbt): add filtering for materialized nodes based on their physical location #14689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dbt): add filtering for materialized nodes based on their physical location #14689
Conversation
- Add database_pattern and schema_pattern config fields with AllowDenyPattern support - Enhance _is_allowed_node() to filter nodes by database and schema in addition to node names - Add comprehensive integration tests for new filtering capabilities - Support combined filtering patterns for fine-grained dbt ingestion control
b7cbd87 to
d702148
Compare
Bundle ReportChanges will increase total bundle size by 1.43kB (0.0%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
|
05d2222 to
d702148
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_common.py
Outdated
Show resolved
Hide resolved
|
Hello @abdullahtariqq , thank you for sharing this contribution. I think the feature is a bit complex, due to the details of
There is an open question whether it should be done against nodes of type The
Is this aligned with the use-case you had in mind when creating it? Of course golden files need to be aligned properly. |
|
@skrydal Thank you for the thoughtful feedback! You're absolutely right about the architectural separation. However, I'd like to clarify the actual use case to ensure we design the right solution: The filtering need is for materialized nodes (tables/views) rather than just sources. Here's the scenario:
The goal is catalog consistency:
So the filtering should apply to all materialized nodes (table, view, etc.) based on where they're physically materialized, not just sources. Proposed Configuration Example source_pattern:
database_pattern:
allow: ["analytics_db", "marts_db"]
schema_pattern:
allow: ["analytics_db\\.marts.*", "analytics_db\\.reporting.*"]
table_pattern:
allow: ["analytics_db\\.marts\\.customer_.*"]Benefits
Does this align with your architectural vision? I'm happy to adjust the approach based on any additional feedback! |
45b970e to
43d7772
Compare
43d7772 to
b27a1f9
Compare
metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_common.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_common.py
Outdated
Show resolved
Hide resolved
Co-authored-by: skrydal <[email protected]>
Co-authored-by: skrydal <[email protected]>
Co-authored-by: skrydal <[email protected]>
Co-authored-by: skrydal <[email protected]>
|
@skrydal thank you for the feedback. |
871bc14 to
2b682af
Compare
|
@skrydal Thank you for the thorough review. Could you please approve it in case everything looks good? |
skrydal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution to the project.
|
FYI @abdullahtariqq this change has been released in |
…al location (#14689) Co-authored-by: Abdullah Tariq <[email protected]> Co-authored-by: skrydal <[email protected]>
Overview
Adds advanced filtering for dbt nodes based on their materialized database location to enable catalog consistency across multi-team dbt projects.
Changes
database_pattern,schema_pattern,table_patternfieldsdatabase→schema→tableUse Case
Problem: Multi-database dbt projects (
operational_db.*,analytics_db.*, ...) need consistent catalog filteringSolution: Match source system ingestion patterns for clean, focused data catalog
Example:
Benefits
Enables fine-grained dbt ingestion control for complex organizational data architectures.