Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure a WatchedSubprocess redacts only its values and not parent processes' #48527

Merged
merged 3 commits into from
Mar 29, 2025

Conversation

amoghrajesh
Copy link
Contributor

Problem

If you have noticed some of the recent logs by running tasks from the latest main, branch they somewhat look like:

[2025-03-26, 17:24:08] ERROR - Task failed with exception: source="task"
KeyError: 'logical_date'
File "/opt/***/task-sdk/src/***/sdk/execution_time/task_runner.py", line 646 in run

File "/opt/***/task-sdk/src/***/sdk/execution_time/task_runner.py", line 892 in _execute_task

File "/opt/***/task-sdk/src/***/sdk/definitions/baseoperator.py", line 380 in wrapper

File "/opt/***/***-core/src/***/decorators/base.py", line 252 in execute

File "/opt/***/task-sdk/src/***/sdk/definitions/baseoperator.py", line 380 in wrapper

File "/opt/***/providers/standard/src/***/providers/standard/operators/python.py", line 212 in execute

File "/opt/***/providers/standard/src/***/providers/standard/operators/python.py", line 235 in execute_callable

File "/opt/***/task-sdk/src/***/sdk/execution_time/callback_runner.py", line 81 in run

File "/files/dags/context_test.py", line 18 in access_context_task

This problem is because we mask the "password" of SQLA when running in breeze (which is airflow), here: https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/settings.py#L372

Although this might seem like a trivial problem for breeze, it isn't. There is a bigger problem here because we SHOULDNT be redacting the SQLA stuff at all for the "subprocess" that task sdk supervisor launches, and the reason for that is the subprocess has not access to the DB at ALL!!

Solution

Testing

Scenario 1: Retrieving a conneciton inside a task and trying to print its password, which is "password"

Connection:

image

DAG:

import structlog

from airflow import DAG

from airflow.providers.standard.operators.python import PythonOperator
from airflow.sdk import Connection


def print_sensitive():
    log = structlog.get_logger()

    c = Connection.get("teradata_default")
    log.info("Retrieved connection", password=c.password)


with DAG(
    dag_id="print_sensitive_data",
    schedule=None,
) as dag:

    print_sensitive_task = PythonOperator(
        task_id="print_sensitive_task",
        python_callable=print_sensitive,
    )

image

Scenario 2: Trying to do the same with conneciton but at global level

DAG:

import structlog

from airflow import DAG

from airflow.providers.standard.operators.python import PythonOperator
from airflow.sdk import Connection

log = structlog.get_logger()
c = Connection.get("teradata_default")
log.info("At the top level is", password=c.password)

def print_sensitive():
    log.info("Retrieved connection and printing in task", password=c.password)

with DAG(
    dag_id="print_sensitive_data_top_level",
    schedule=None,
) as dag:

    print_sensitive_task = PythonOperator(
        task_id="print_sensitive_task",
        python_callable=print_sensitive,
    )

image


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@amoghrajesh
Copy link
Contributor Author

Guess i have to write some tests, but majority of the work is complete here.

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm -- added few comments

@amoghrajesh
Copy link
Contributor Author

Test failure isnt related

@amoghrajesh amoghrajesh requested a review from kaxil March 29, 2025 11:08
@kaxil kaxil merged commit 3761024 into apache:main Mar 29, 2025
62 checks passed
@kaxil kaxil deleted the masking-in-supervisor branch March 29, 2025 13:01
shubham-pyc pushed a commit to shubham-pyc/airflow that referenced this pull request Apr 2, 2025
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
simonprydden pushed a commit to simonprydden/airflow that referenced this pull request Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants