Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore and add static checks for DAGs for early detection of common issues #43176

Open
1 of 2 tasks
omkar-foss opened this issue Oct 18, 2024 · 21 comments
Open
1 of 2 tasks
Assignees

Comments

@omkar-foss
Copy link
Collaborator

Description

As per users' feedback in the Airflow Debugging Survey 2024, 48.3% of respondents chose early issue detection during execution as one of their top 2 choices.

Use case/motivation

Goal of this issue:

  • Enhance early detection of DAG issues to minimize dev time
  • Some sort of static analysis similar to Ruff checks for DAGs
  • Runtime analysis of DAGs if feasible can also be done to save dev time

Related issues

Parent Issue: #40975

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@omkar-foss omkar-foss added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Oct 18, 2024
@omkar-foss omkar-foss changed the title Explore and add static checks for DAGs for early detect of common issues Explore and add static checks for DAGs for early detection of common issues Oct 18, 2024
@nathadfield nathadfield removed the needs-triage label for new issues that we didn't triage yet label Oct 23, 2024
@omkar-foss
Copy link
Collaborator Author

In the context of this issue, DAG linting as mentioned by this article (shared by @potiuk on slack) can also be explored:

https://medium.com/@snir.isl/mastering-airflow-dag-standardization-with-pythons-ast-a-deep-dive-into-linting-at-scale-1396771a9b90

Inspired by above article, may be we can have commands like airflow dag lint <dagname>.py for checking common DAGs issues and airflow dag grade <dagname>.py for grading DAGs based on their code quality.

@potiuk
Copy link
Member

potiuk commented Oct 29, 2024

Love that idea. We could even had some way of checking for "best practices" - like not using DB while parsing etc. This might also be then used as part of the upgrade-check mechanism that we are planning for Airflow 2-> 3 migration - see #41641 cc: @Lee-W

@Lee-W
Copy link
Member

Lee-W commented Oct 30, 2024

I feel it's a bit different. 🤔 But for not using DB while parsing, that's something we should check in the upgrade check. But the "best practices" thing would probably be something else. Probably integrating with ruff or building our own linter would be better.

@Dev-iL
Copy link
Contributor

Dev-iL commented Jan 5, 2025

Good to see new airflow-specific Ruff rules being added (AIR3##). Nice work @Lee-W!

@zach-overflow
Copy link

This is definitely something I'd find immensely useful. In addition to the article linked in the earlier comments, there are some decent prior attempts at doing something similar with static DAG code checks (see links below); however, I think the major benefit for making this feature native to Airflow is that it can rely directly on the built-in DAG parsing logic.

Some of the examples I was referring to:

  • pylint-airflow: which is not maintained, but defines some reasonable lint checks as custom pylint checkers.
  • airflint: this might be useful as a reference, but it is worth noting that this is an auto-formatter rather than a simple validation tool.

@Lee-W
Copy link
Member

Lee-W commented Feb 12, 2025

We could probably start discussing what the best practice is after Airflow 3.0 is released (best practice now might not reflect the latest best practice we want then)

@omkar-foss
Copy link
Collaborator Author

omkar-foss commented Feb 14, 2025

We could probably start discussing what the best practice is after Airflow 3.0 is released (best practice now might not reflect the latest best practice we want then)

Yes that makes sense, let's add support for best practices linting for Airflow 3.x onwards. I just took a quick opinion and got a good concensus for 3.x onwards on slack.

@omkar-foss
Copy link
Collaborator Author

@Lee-W would you be interested in picking up this issue? Because I suppose you have good context on Airflow static checks by adding Ruff rules for it, among other things.

@omkar-foss omkar-foss moved this from Planning to Todo in Debugging Improvements - Airflow 3 Feb 14, 2025
@Lee-W
Copy link
Member

Lee-W commented Feb 14, 2025

@Lee-W would you be interested in picking up this issue? Because I suppose you have good context on Airflow static checks by adding Ruff rules for it, among other things.

Yep, implementing them is interesting, and I would definitely be interested in adding more rules haha

@Lee-W
Copy link
Member

Lee-W commented Feb 14, 2025

I'll probably focus most on implementing rules, but will need everyone's suggestion on what the best practices we want 🙂

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 14, 2025

Perhaps we should start a shared doc to raise ideas?

Personally, I'd love some checks related to XCOMs, and in particular in the context of task groups, to ensure they're unpacked/deserialized properly. Also, if possible, a check that custom classes returned by dags are included in the allowed_deserialization_classes.

@Lee-W
Copy link
Member

Lee-W commented Feb 15, 2025

we could probably just discuss here and after we have some rough idea, we can organize them to something like #41641

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 19, 2025

Idea:
If a task.branch has at least two returns, and exactly one of them is not [], suggest that the user replaces it with task.short_circuit, which is simpler.

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 19, 2025

Idea:
When a task/operator has a templated input that the user specifies as a string, and it contains a template to pull the xcom of exactly one task, suggest replacing it with an object representation of the upstream output. An example with operators:

# Before:
some_task = SomeOperator(task_id="some_task", ...)
some_other_task = SomeOtherOperator(
    templated_field="{{ ti.xcom_pull(task_ids='some_task') }}",
)

# After:
some_task = SomeOperator(...)  # Note the task id no longer has to be specified explicitly.
some_other_task = SomeOtherOperator(
    templated_field=some_task.output,
)

...and equivalently for tasks.

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 19, 2025

Some ideas from this blog post:

  • The DAG must not have top-level calls to expensive classes/services.
  • The DAG object should be instantiated using a context manager.
  • The DAG file name should match its dag_id.

@zach-overflow
Copy link

Some additional best practices rule suggestions:

  • All dag_id's should be unique (AKA prevent name collisions and detect them at validation time).

  • +1 to @Dev-iL's suggestion on top-level code, specifically I can think of two preliminary checks for that category:

    1. Ensure function-level imports are used in DAG files when possible (again, an example of this kind of rule may be found in the airflint repo here)
    2. Require Variables passed to operator constructors are only passed as Jinja templates (per the official best practice section here).
  • Lastly -- and this might be a larger scope suggestion -- it would be really useful if the Airflow linter could validate the user's DAG code against any user-defined task policy or dag policy (as defined in their cluster policy).

@zach-overflow
Copy link

One other lint check idea came to mind:

  • Optionally check whether DAG definitions are serializable or not

@Dev-iL
Copy link
Contributor

Dev-iL commented Feb 23, 2025

@zach-overflow

* All `dag_id`'s should be unique (AKA prevent name collisions and detect them at validation time).

I'm not sure Ruff supports multi-file checks, so I suspect this might only be applicable to situations where multiple DAGs are defined in a single script. IIRC this can be detected by a DagBag unit test.

* Lastly -- and this might be a larger scope suggestion -- it would be _really_ useful if the Airflow linter could validate the user's DAG code against any user-defined task policy or dag policy (as defined in their [cluster policy](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/cluster-policies.html#cluster-policies)).

I think this too belongs in the realm of checks to be added to DagBag and/or unit tests.

* Optionally check whether DAG definitions are serializable or not

Could you please elaborate (mainly out of curiosity):

  • What common pattern might make a DAG definition non-serializable?
  • In what contexts does this matter?
  • Is this supposed to be a static or a DagBag-related test? If static - how can this be done without trying to serialize the DAG and seeing if it works?

All in all I these are some good ideas to be included in DagBag, some variation of DagBag that's used for DAG validation, or at the very least - unit test examples to be added to the docs.

@snirisl
Copy link

snirisl commented Feb 26, 2025

Some ideas from this blog post:

  • The DAG must not have top-level calls to expensive classes/services.
  • The DAG object should be instantiated using a context manager.
  • The DAG file name should match its dag_id.

Thanks @Dev-iL for referencing my blog post 🙏 Another thing we can add is consider implementing a rule, adding to the third rule above, that dag id should be all lower case characters. Since DAG IDs should match the file names, enforcing this one is making sure DAG files to be compatible with the Python convention for file names.

@Dev-iL
Copy link
Contributor

Dev-iL commented Mar 3, 2025

  • Check that paths to temporary files/folders created in a task aren't returned (in XCOM) - since it cannot be guaranteed that downstream tasks will run on the same worker, and therefore will have access to the temporary path.

@zach-overflow
Copy link

@Dev-iL Thank you for pointing me towards the DagBag unit testing page, I wasn't aware that was an available approach. You're correct that the DagBag unit testing would cover unique DAG ID enforcement.

As for checking against user-defined cluster policies, I think the DagBag unit testing could cover that, but the cluster policy docs are not clear if that is evaluated during the DagBag loading, or during the actual DAG/task scheduling.

TL;DR: I think the most broadly useful static check would be the checks for problematic top-level code (e.g. network calls, variable.get() calls, expensive top-level imports, etc).

What common pattern might make a DAG definition non-serializable?

  • After my previous post, I leared that Airflow enforces DAG serialization, so I suppose this is less of a concern unless users are relying on some extended/custom serialization functionality.

In what contexts does this matter?

  • Likely not many except for some non-standard serialization, so I think this is less of a concern for me now.

Is this supposed to be a static or a DagBag-related test? If static - how can this be done without trying to serialize the DAG and seeing if it works?

  • I guess the best way would to simply run static unit tests against any custom serialization methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants