Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap CeleryExecutor over to use TaskSDK for execution. #46265

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ashb
Copy link
Member

@ashb ashb commented Jan 29, 2025

Closes #45426

Some points of note about this PR:

  • Logging is changed in Celery, but only for Airflow 3

    Celery does it's own "capture stdout" logging, which conflicts with the ones
    we do in the TaskSDK, so we disable that; but to not change anything for
    Airflow 3.

  • Simplify task SDK logging redirection

    As part of this discovery that Celery captures stdout/stderr itself (and
    before disabling that) I discovered a simpler way to re-open the
    stdin/out/err so that the implementation needs fewer/no special casing.

  • Make JSON task logs more readable by giving them a consistent/useful order

    We re-order (by re-creating) the event_dict so that timestamp, level, and
    then even are always the first items in the dict

  • Makes the CeleryExecutor understand the concept of "workloads" instead a
    command tuple.

    This change isn't done in the best way, but until Kube executor is swapped
    over (and possibly the other in-tree executors, such as ECS) we need to
    support both styles concurrently.

    The change should be done in such a way that the provider still works with
    Airflow v2, if it's running on that version.

  • Upgrade Celery

    This turned out to not be 100% necessary but it does fix some deprecation
    warnings when running on Python 3.12

  • Ensure that the forked process in TaskSDK never ever exits

    Again, this isn't possible usually, but since the setup step of _fork_main
    died, it didn't call os._exit(), and was caught further up, which meant
    the process stayed alive as it never closed the sockets properly. We put and
    extra safety try/except block in place to catch that

I have not yet included a newsfragment for changing the executor interface as
the old style is currently still supported.

Testing: I ran airflow scheduler, airflow fastapi-api, and airflow celery worker and triggered some dags, viewed logs etc.

And since a picture makes all PRs better

Screenshot 2025-01-29 at 22 49 49

Celery window showing debug logs from the supervisor itself (task output goes to file)

Screenshot 2025-01-29 at 22 50 56

@boring-cyborg boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:Scheduler including HA (high availability) scheduler area:task-sdk provider:celery labels Jan 29, 2025
task_sdk/src/airflow/sdk/log.py Outdated Show resolved Hide resolved
@ashb ashb force-pushed the swap-celery-exec-to-tasksdk branch from 315d736 to 8263c94 Compare January 29, 2025 22:59
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ashb!
Have a few comments, nothing serious, general comments / nits, I am OK with the PR mostly

@amoghrajesh

This comment was marked as resolved.

@ashb ashb force-pushed the swap-celery-exec-to-tasksdk branch 3 times, most recently from 2038dc1 to 8e2d7cc Compare January 30, 2025 13:36
@ashb ashb requested a review from eladkal as a code owner January 30, 2025 13:36
@ashb ashb force-pushed the swap-celery-exec-to-tasksdk branch 2 times, most recently from ef0a8b9 to fd30378 Compare January 30, 2025 15:29
@ashb ashb force-pushed the swap-celery-exec-to-tasksdk branch from fd30378 to a7635c9 Compare January 30, 2025 16:34
@ashb
Copy link
Member Author

ashb commented Jan 30, 2025

RIght I think this is good, the failing tests are fixed in main by Jarek's hard work.

One failure I wasn't happy with, so I've rebased.

Some points of note about this change:

- Logging is changed in Celery, but only for Airflow 3

  Celery does it's own "capture stdout" logging, which conflicts with the ones
  we do in the TaskSDK, so we disable that; but to not change anything for
  Airflow 3.

- Simplify task SDK logging redirection

  As part of this discovery that Celery captures stdout/stderr itself (and
  before disabling that) I discovered a simpler way to re-open the
  stdin/out/err so that the implementation needs fewer/no special casing.

- Make JSON task logs more readable by giving them a consistent/useful order

  We re-order (by re-creating) the event_dict so that timestamp, level, and
  then even are always the first items in the dict

- Makes the CeleryExecutor understand the concept of "workloads" instead a
  command tuple.

  This change isn't done in the best way, but until Kube executor is swapped
  over (and possibly the other in-tree executors, such as ECS) we need to
  support both styles concurrently.

  The change should be done in such a way that the provider still works with
  Airflow v2, if it's running on that version.

- Upgrade Celery

  This turned out to not be 100% necessary but it does fix some deprecation
  warnings when running on Python 3.12

- Ensure that the forked process in TaskSDK _never ever_ exits

  Again, this isn't possible usually, but since the setup step of `_fork_main`
  died, it didn't call `os._exit()`, and was caught further up, which meant
  the process stayed alive as it never closed the sockets properly. We put and
  extra safety try/except block in place to catch that

I have not yet included a newsfragment for changing the executor interface as
the old style is _currently_ still supported.
@ashb ashb force-pushed the swap-celery-exec-to-tasksdk branch from a7635c9 to 15aec32 Compare January 30, 2025 17:33
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to base look good to me to support both for now. I think it's called out in a comment but we should migrate all in-tree executors (K8s, ECS, Batch, Edge, etc) and add a new min Airflow version for those packages.

with _prepare_app(execute=fake_execute_command):
# fake_execute_command takes no arguments while execute_command takes 1,
with _prepare_app(execute=fake_task):
# fake_execute_command takes no arguments while execute_workload takes 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# fake_execute_command takes no arguments while execute_workload takes 1,
# fake_task takes no arguments while execute_workload takes 1,

airflow/executors/base_executor.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Executors-core LocalExecutor & SequentialExecutor area:Scheduler including HA (high availability) scheduler area:task-sdk provider:celery
Projects
None yet
4 participants