-
Notifications
You must be signed in to change notification settings - Fork 184
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Documentation for Conditional Dataset Scheduling with dag-factory (…
…#367) Closes: #361 This PR adds a comprehensive guide on conditional dataset scheduling using dag-factory (v0.22.0+) and Airflow 2.9+. Key updates include: Explanation of conditional dataset scheduling and its use cases. Requirements for using the feature. Examples demonstrating configurations with both string and YAML syntax. Visual diagrams illustrating dataset condition logic. This documentation is intended to help users understand and implement conditional dataset scheduling effectively in their workflows. --------- Co-authored-by: ErickSeo <[email protected]>
- Loading branch information
Showing
10 changed files
with
117 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
producer_dag: | ||
default_args: | ||
owner: "example_owner" | ||
retries: 1 | ||
start_date: '2024-01-01' | ||
description: "Example DAG producer simple datasets" | ||
schedule_interval: "0 5 * * *" | ||
tasks: | ||
task_1: | ||
operator: airflow.operators.bash_operator.BashOperator | ||
bash_command: "echo 1" | ||
outlets: [ 's3://bucket_example/raw/dataset1.json' ] | ||
task_2: | ||
bash_command: "echo 2" | ||
dependencies: [ task_1 ] | ||
outlets: [ 's3://bucket_example/raw/dataset2.json' ] | ||
consumer_dag: | ||
default_args: | ||
owner: "example_owner" | ||
retries: 1 | ||
start_date: '2024-01-01' | ||
description: "Example DAG consumer simple datasets" | ||
schedule: [ 's3://bucket_example/raw/dataset1.json', 's3://bucket_example/raw/dataset2.json' ] | ||
tasks: | ||
task_1: | ||
operator: airflow.operators.bash_operator.BashOperator | ||
bash_command: "echo 'consumer datasets'" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
consumer_dag: | ||
default_args: | ||
owner: "example_owner" | ||
retries: 1 | ||
start_date: '2024-01-01' | ||
description: "Example DAG consumer simple datasets" | ||
schedule: | ||
datasets: "((s3://bucket-cjmm/raw/dataset_custom_1 & s3://bucket-cjmm/raw/dataset_custom_2) | s3://bucket-cjmm/raw/dataset_custom_3)" | ||
tasks: | ||
task_1: | ||
operator: airflow.operators.bash_operator.BashOperator | ||
bash_command: "echo 'consumer datasets'" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
consumer_dag: | ||
default_args: | ||
owner: "example_owner" | ||
retries: 1 | ||
start_date: '2024-01-01' | ||
description: "Example DAG consumer simple datasets" | ||
schedule: | ||
datasets: | ||
!or | ||
- !and | ||
- "s3://bucket-cjmm/raw/dataset_custom_1" | ||
- "s3://bucket-cjmm/raw/dataset_custom_2" | ||
- "s3://bucket-cjmm/raw/dataset_custom_3" | ||
tasks: | ||
task_1: | ||
operator: airflow.operators.bash_operator.BashOperator | ||
bash_command: "echo 'consumer datasets'" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Datasets | ||
DAG Factory supports Airflow’s [Datasets](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html). | ||
|
||
## Datasets Outlets | ||
|
||
To leverage, you need to specify the `Dataset` in the `outlets` key in the configuration file. The `outlets` key is a list of strings that represent the dataset locations. | ||
In the `schedule` key of the consumer dag, you can set the `Dataset` you would like to schedule against. The key is a list of strings that represent the dataset locations. | ||
The consumer dag will run when all the datasets are available. | ||
|
||
#### Example: Outlet | ||
|
||
```title="example_dag_datasets_outlet.yml" | ||
--8<-- "dev/dags/datasets/example_dag_datasets_outlet.yml" | ||
``` | ||
|
||
![datasets_example.png](../static/images/datasets/outlets/datasets_example.png "Simple Dataset Producer") | ||
|
||
## Conditional Dataset Scheduling | ||
|
||
#### Minimum Requirements: | ||
* dag-factory 0.22.0+ | ||
* [Apache Airflow® 2.9+](https://www.astronomer.io/docs/learn/airflow-datasets/#conditional-dataset-scheduling) | ||
|
||
|
||
#### Logical operators for datasets | ||
Airflow supports two logical operators for combining dataset conditions: | ||
|
||
* AND (``&``): Specifies that the DAG should be triggered only after all of the specified datasets have been updated. | ||
* OR (``|``): Specifies that the DAG should be triggered when any of the specified datasets is updated. | ||
|
||
These operators enable you to configure your Airflow workflows to use more complex dataset update conditions, making them more dynamic and flexible. | ||
|
||
#### Examples of Conditional Dataset Scheduling | ||
|
||
Below are examples demonstrating how to configure a consumer DAG using conditional dataset scheduling. | ||
|
||
##### Example 1: String Condition | ||
|
||
```title="example_dataset_condition_string.yml" | ||
--8<-- "dev/dags/datasets/example_dataset_condition_string.yml" | ||
``` | ||
|
||
##### Example 2: YAML Syntax | ||
|
||
```title="example_dataset_yaml_syntax.yml" | ||
--8<-- "dev/dags/datasets/example_dataset_yaml_syntax.yml" | ||
``` | ||
|
||
--- | ||
|
||
#### Visualization | ||
|
||
The following diagrams illustrate the dataset conditions described in the example configurations: | ||
|
||
1. **`s3://bucket-cjmm/raw/dataset_custom_1`** and **`s3://bucket-cjmm/raw/dataset_custom_2`** must both be updated for the first condition to be satisfied. | ||
2. Alternatively, **`s3://bucket-cjmm/raw/dataset_custom_3`** alone can satisfy the condition. | ||
|
||
![Graph Conditional Dataset 1](../static/images/datasets/conditions/graph_conditional_dataset.png) | ||
![Graph Conditional Dataset 2](../static/images/datasets/conditions/graph_conditional_dataset_2.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.5 KB
docs/static/images/datasets/conditions/graph_conditional_dataset_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters