Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support contracts in Redshift even when the first row has NULL values #10490

Open
2 tasks done
gekas93 opened this issue Jul 26, 2024 · 4 comments
Open
2 tasks done
Labels

Comments

@gekas93
Copy link

gekas93 commented Jul 26, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I'm having a strange issue with the contracts in my dbt model and working with redshift.

I have fixed some fields with a data_type = 'date' and they were working well but I have added a NULLIF condition like this:

cast(NULLIF(psl.plant_start_date, '9999-12-31') as date) as plant_start_date,
cast(NULLIF(psl.plant_end_date, '9999-12-31') as date) as plant_end_date,

Before of adding the NULLIF function, everything works as expected.

After adding it I am having this message:

column_name definition_type contract_type mismatch_reason
plant_end_date TEXT DATE data type mismatch
plant_start_date TEXT DATE data type mismatch

I have tried to cast in each place (before, after, in the middle more than one time) but I always have the same issue. I tried to use a case instead of NULLIF but the same happens.

I have token a look in the macros and I have see that it's this macro the responsible of the behaviour:

{% macro assert_columns_equivalent(sql) %}

  {#-- First ensure the user has defined 'columns' in yaml specification --#}
  {%- set user_defined_columns = model['columns'] -%}
  {%- if not user_defined_columns -%}
      {{ exceptions.raise_contract_error([], []) }}
  {%- endif -%}

  {#-- Obtain the column schema provided by sql file. #}
  {%- set sql_file_provided_columns = get_column_schema_from_query(sql, config.get('sql_header', none)) -%}
  {#--Obtain the column schema provided by the schema file by generating an 'empty schema' query from the model's columns. #}
  {%- set schema_file_provided_columns = get_column_schema_from_query(get_empty_schema_sql(user_defined_columns)) -%}
    {{ log("sql_file_provided_columns: " ~ sql_file_provided_columns) }}
    {{ log("schema_file_provided_columns: " ~ schema_file_provided_columns) }}

I have added two logs at the end to confirm it. The sql_file_provided_columns is giving me the dates as a varchar.

Expected Behavior

DBT detects that the definition type is DATE for plant_start_date and plant_end_date

Steps To Reproduce

  1. Create a SQL with cast(NULLIF(<date_field>, '9999-12-31') as date) as <date_field_name>
  2. Define the contract and define your date name field with a data_type equal to date
- name: <date_field_name>
        data_type: date
        description: ""
  1. Run your dbt run or dbt build command with the model

Relevant log output

�[0m18:00:42.359462 [debug] [Thread-1  ]: sql_file_provided_columns: [..., <Column plant_start_date (character varying(256))>, <Column plant_end_date (character varying(256))>]
�[0m18:00:42.360137 [debug] [Thread-1  ]: schema_file_provided_columns: [..., <Column plant_start_date (DATE)>, <Column plant_end_date (DATE)>]
�[0m18:00:42.370309 [debug] [Thread-1  ]: On model.project.model_name: Close
�[0m18:00:42.373215 [debug] [Thread-1  ]: Compilation Error in model model_name (models/marts/pipeline_folder/model_name.sql)
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
  
  | column_name      | definition_type | contract_type | mismatch_reason    |
  | ---------------- | --------------- | ------------- | ------------------ |
  | plant_end_date   | TEXT            | DATE          | data type mismatch |
  | plant_start_date | TEXT            | DATE          | data type mismatch |

Environment

- OS: MAC 
- Python: 3.9.18
- dbt-redshift = 1.8.0
- dbt-core = 1.8.1

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@gekas93 gekas93 added bug Something isn't working triage labels Jul 26, 2024
@dbeatty10 dbeatty10 self-assigned this Jul 26, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @gekas93 !

This looks like dbt-labs/dbt-redshift#659 where Redshift treats null values as VARCHAR/TEXT even when you say something like cast(null as date) or null::date.

dbt-labs/dbt-redshift#659 (comment) discusses our suggested workaround which is to add something like this to the beginning of your model to handle this Redshift-specific behavior:

select 
    '9999-12-31'::date as plant_start_date,
    '9999-12-31'::date as plant_end_date,
    # ... the rest of your columns here
where 1=0
union all

Could you try that out and see if it resolves the issue?

@dbeatty10 dbeatty10 removed the triage label Jul 26, 2024
@dbeatty10 dbeatty10 removed their assignment Jul 26, 2024
@gekas93 gekas93 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 27, 2024
@gekas93
Copy link
Author

gekas93 commented Jul 27, 2024

Thanks for your quick response @dbeatty10 !

Sorry, because I tried to find some topics related with mine but I didn't found anything.

Yes, with your approach it's working but it's a problem when you have a big number of columns. In my case, in this table I have like 50 columns, so I had to add more than 50 code lines and 50 manual casts.

There will be some approach in the future that will solve this?

Obviously, if I add the IFNULL operation in a previously model and it's materialized as a table, I don't have this issue. But if it's materialized as a view (in my case has to be like that), it is still there.

@gekas93 gekas93 reopened this Jul 27, 2024
@dbeatty10 dbeatty10 changed the title [Bug] Data type bug in Redshift contracts [Feature] Support contracts in Redshift even when the first row has NULL values Jul 29, 2024
@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working labels Jul 29, 2024
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 26, 2025
@gekas93
Copy link
Author

gekas93 commented Jan 27, 2025

still open

@github-actions github-actions bot removed the stale Issues that have gone stale label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants