Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-aware scheduling with time delay #46230

Open
1 of 2 tasks
lafirm opened this issue Jan 29, 2025 · 5 comments
Open
1 of 2 tasks

Data-aware scheduling with time delay #46230

lafirm opened this issue Jan 29, 2025 · 5 comments
Labels
area:datasets Issues related to the datasets feature kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet

Comments

@lafirm
Copy link

lafirm commented Jan 29, 2025

Description

time delay args for data-aware scheduling dags

Use case/motivation

I have 2 dags, main_dag and status_dag. main_dag outlets a dataset called dataset1 and the status_dag should be triggered after 10 minutes of when dataset1 is created or updated. Right now, I use time.sleep() method in status_dag to delay for 10 mins and then get executed.

Wouldn't this be helpful to add a time delay for data-aware scheduling dags?

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@lafirm lafirm added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Jan 29, 2025
Copy link

boring-cyborg bot commented Jan 29, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the area:datasets Issues related to the datasets feature label Jan 29, 2025
@eladkal
Copy link
Contributor

eladkal commented Jan 29, 2025

should be triggered after 10 minutes of when dataset1 is created or updated

Can you share reasoning why would you want that?

@lafirm
Copy link
Author

lafirm commented Jan 29, 2025

Can you share reasoning why would you want that?

In my current project where I use SQLMesh which uses intervals and I need to wait for some time to get the complete intervals where I don't want to use allow_partials. I said 10 mins is for an example. And I apologise that this is the max context I can provide, can't go more detail on this. Hope you understand!

It's not a blocker for me, however I thought that this feature would be beneficial for someone like me who works with status updates and SQLMesh intervals.

@eladkal
Copy link
Contributor

eladkal commented Jan 29, 2025

I am not sure I follow on the scenario.
Are you saying that some component reported finished successfully - thus downstream task is scheduled but the finish success report is not true until 10 min pass? Then why it reported success to begin with?

@lafirm
Copy link
Author

lafirm commented Jan 29, 2025

Its not correct.

My use case is that an incremental SQLMesh model (called model_a) created by an airflow task which contains intervals, and I've another DAG with incremental model_b which has a where clause on start and end datetime of the model intervals and model_b needs the upto date value of model_a for which I need to wait for the interval to get completed because of the way SQLMesh works and it's interval, start and end datetime of a model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:datasets Issues related to the datasets feature kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

2 participants