-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporary tables for volume and freshness tests are not properly cleaned on Athena #1514
Comments
Hi @bebbo203 ! |
Hi, happy to contribute as soon as I can. |
Hello @haritamar, I've found a possible solution to the problem, but I have some questions about the approach.
What do you think about this proposed approach? I've tested it, and it seems to work. Additionally, there's another issue: if a run-time error occurs, the run stops, leaving temporary tables in Athena. Subsequent runs won't see these tables, leaving them in the database and on S3 indefinitely. This could be addressed by adding a script to the on-run-end hook, but this solution is specific to Athena and won't work with other adapters. Should we proceed with the solutions proposed in points 1, 2, and 3 and leave this issue as is? If you want to take a look at the code, I'm currentyl working on this repo that already contains all the fix that I described above. I'm sorry if this is not the correct way to discuss the changes that need to be made. This is my first time collaborating on bug resolution. If you could point me in the right direction, I would be more than happy to follow your guidance! |
Hi @bebbo203 ! Regarding (2) - I believe
Regarding (3) - I think it's better that
(or possibly change Regarding the run-time error issue - since users tend to be sensitive about runtimes of the on-run-end hook, maybe I'd actually create a separate macro that users can schedule with |
Hi @haritamar, I've submitted a draft PR.
|
Hi @bebbo203 ! |
Yes! Thank you for the hint! I've pushed the change to the branch but it's not appearing in the PR. Should I do anything else? |
PR was merged so closing the issue |
Describe the bug
After a succesful run of volume and source freshness tests, tables in the form:
data_monitoring_metrics_tmp_<timestamp>
are left in Athena.
Moreover, while not present in Athena, is still possible to find files about other temporary tables in the form:
test_<hex_code>_elementary_volume_anomalies_<model_name>_<other_descs>
in the s3 bucket
To Reproduce
Steps to reproduce the behavior:
dbt run --select elementary
dbt test
s3_data_dir
Expected behavior
Temporary tables must be completely dropped after a run (metadata + parquet files in S3).
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: