Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Retrieve the description of columns and tables from big query #775

Open
3 tasks done
Rkejji opened this issue Jan 27, 2025 · 0 comments
Open
3 tasks done
Labels
pkg:dbt-bigquery Issue affects dbt-bigquery

Comments

@Rkejji
Copy link

Rkejji commented Jan 27, 2025

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Issue Overview

Following this discussion, I am raising this issue to address the lack of descriptions in the generated source.yml file.

Current Behaviour

When using codgen to generate a source.yml file with the list of columns and tables in my datasource, the description field is empty for both tables and columns.

Expected Behaviour

The descriptions of columns and tables in BigQuery should be parsed automatically and included in the generated source.yml.

Why This Relates to dbt-bigquery and Not codegen

The codegen package cannot include descriptions because the description field is not retrieved by dbt. This appears to be due to the BigqueryColumn class not having a description attribute.

Proposed Solution

When calling the BigQuery API in the get_table function from the Bigquery Client API here, retrieve the table and column descriptions from the returned table object. This will make the descriptions available in the Relation class for use by codegen.

Steps to Reproduce

  1. Set up a BigQuery datasource with tables and columns that have descriptions.
  2. Run dbt codegen generate to create a source.yml file.
  3. Observe that the description fields are empty.

Expected Output

An example of the expected source.yml file:

version: 2
sources:
  - name: my_table
    description: "Table description"
    columns:
      - name: my_column
        description: "Column description"

Describe alternatives you've considered

There are two alternatives to this issue:

  • Write a custom script that queries the table's schema from GCP BigQuery API and parse the returned result in source.yml file -> This solution requires too much verbose and non reusable code
  • Use the INFORMATION_SCHEMA as suggested here ->But this requires more permissions and it is very repetitive to write the name of every column

Who will this benefit?

Every user of dbt-BigQuery working with very large data sources, such as SAP tables with hundreds of columns. Personnaly I had to do a migration of SAP tables to DBT it was a very repetitive work to copy all the tables, with their columns in the source.yml file.

Are you interested in contributing this feature?

Yes

Anything else?

No response

@amychen1776 amychen1776 transferred this issue from dbt-labs/dbt-bigquery Feb 4, 2025
@amychen1776 amychen1776 added the pkg:dbt-bigquery Issue affects dbt-bigquery label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-bigquery Issue affects dbt-bigquery
Projects
None yet
Development

No branches or pull requests

2 participants