You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Just to preface
This is only a problem when not working with/in the UTC timezone (which is discouraged according to the docs).
This is a problem in Snowflake, can't say anything for different warehouse implementations.
Context
Let's think about micro-batching as a 3 step process:
Create temporary view containing the entries of a batch, e.g. of one day
Delete all entries of that batch (in our example one day) from the target table
Insert the newly created rows from step 1 into the target table
The bug is regarding how comparing dates/timestamps work in step 1 and 2.
Comparing dates/timestamps
For creating the temporary view for e.g. the 2025-01-19, dbt filters the upstream tables based on the event_time column, which looks like the following:
...
some_cte as (
select*from
(
select*from
some_table
where
event_time_column >='2025-01-19 00:00:00+00:00'and event_time_column <'2025-01-20 00:00:00+00:00'
)
)
...
When deleting the rows to re-insert into the table, the generated SQL query looks as follows:
deletefrom target_table
where (
event_time_column >= to_timestamp_tz('2025-01-19 00:00:00+00:00')
and event_time_column < to_timestamp_tz('2025-01-20 00:00:00+00:00')
)
As you can see, the comparison is inconsistent: '<timestamp>' vs. to_timestamp_tz('<timestamp>')
If one works with dates and outside the UTC timezone, the comparisons lead to different results, ultimately leading to deleting different rows in the target table than the ones that have been re-calculated. In our case this led to duplicate rows in the target table. See below for a short SQL query demonstrating the different results in the date/timestamp comparisons.
Expected Behavior
Even when working outside the UTC timezone, I'd expect that the delete/insert queries target the same range of rows.
A fix should be fairly simple: Make the comparison in both queries consistent (e.g. use to_timestamp_tz('<timestamp>') in both).
Steps To Reproduce
Here is a short code snippet to try out the different behavior leading to the error described above:
alter session set timezone ="Europe/Berlin";
select'2025-01-19'::dateas event_time_column,
event_time_column >='2025-01-19 00:00:00+00:00'and event_time_column <'2025-01-20 00:00:00+00:00'as _1,
event_time_column >= to_timestamp_tz('2025-01-19 00:00:00+00:00') and event_time_column < to_timestamp_tz('2025-01-20 00:00:00+00:00') as _2
Relevant log output
Environment
We use the following docker image: ghcr.io/dbt-labs/dbt-snowflake:1.9.0
Which database adapter are you using with dbt?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Is this a new bug in dbt-core?
Current Behavior
Just to preface
Context
Let's think about micro-batching as a 3 step process:
The bug is regarding how comparing dates/timestamps work in step 1 and 2.
Comparing dates/timestamps
For creating the temporary view for e.g. the 2025-01-19, dbt filters the upstream tables based on the event_time column, which looks like the following:
When deleting the rows to re-insert into the table, the generated SQL query looks as follows:
As you can see, the comparison is inconsistent:
'<timestamp>'
vs.to_timestamp_tz('<timestamp>')
If one works with dates and outside the UTC timezone, the comparisons lead to different results, ultimately leading to deleting different rows in the target table than the ones that have been re-calculated. In our case this led to duplicate rows in the target table. See below for a short SQL query demonstrating the different results in the date/timestamp comparisons.
Expected Behavior
Even when working outside the UTC timezone, I'd expect that the delete/insert queries target the same range of rows.
A fix should be fairly simple: Make the comparison in both queries consistent (e.g. use
to_timestamp_tz('<timestamp>')
in both).Steps To Reproduce
Here is a short code snippet to try out the different behavior leading to the error described above:
Relevant log output
Environment
Which database adapter are you using with dbt?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: