Failed to fetch log file from worker because webserver did not search the right worker. #26255
Replies: 7 comments 10 replies
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
Is this the same as #15767? |
Beta Was this translation helpful? Give feedback.
-
Yes, I thinks it is almostly the same. |
Beta Was this translation helpful? Give feedback.
-
Can confirm it's still present in v2.2.3. |
Beta Was this translation helpful? Give feedback.
-
Can you please check if this issue still happens on latest Airflow version? |
Beta Was this translation helpful? Give feedback.
-
I do recall it's been mitigated in later versions of Airflow, so indeed checking it with latest version is a good idea. Converting it into discussion. |
Beta Was this translation helpful? Give feedback.
-
@potiuk @eladkal I just confirmed that this is still happening in Airflow 2.5.1. Steps to reproduce are the same as are shown in #26069. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version:
Airflow 2.1.0
Environment:
uname -a
): 5.4.0-48-What happened:
When a task retry a lot of times, some log show error:
It is because the log



retry4/print_date/2021-06-14T00:00:00+00:00/10.log
is not in hadoop05 and it is in hadoop04. we can find log file /home/hadoop/airflow/logs/retry4/print_date/2021-06-14T00:00:00+00:00/10.log in hadoop04.As we can see, the task's latest log is in hadoop05. I guess webserver firstly search log from local host, secondly it will search host from the latest hostname nomatter if other previous logs is in the latest hostname.
We can get the task's laest log in hadoop05.
We can also get previous log in webserver host hadoop03.
But We can not get previous log which is not in hadoop05 and hadoop03(webserver host). Because webserver try to search log from hadop05 but not hadoop04 which the log is really in.

What you expected to happen:
Webserver should try to get log from ther right worker server but not hadoop05 which the latest log is in.
How to reproduce it:
You can try these dags in a airflow cluster that has more than 3 celery worker node.
Anything else we need to know:
I have test in airflow2.0.2 and airflow1.10.12, this bug always exists.
Beta Was this translation helpful? Give feedback.
All reactions