[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

ueshin · 2025-01-24T01:54:07Z

What changes were proposed in this pull request?

This is a backport of #49592.

Setups faulthandler in PythonPlannerRunners.

It can be enabled by the same config as UDFs.

SQL conf: spark.sql.execution.pyspark.udf.faulthandler.enabled
It fallback to Spark conf: spark.python.worker.faulthandler.enabled
False by default

Why are the changes needed?

The faulthandler is not set up in PythonPlannerRunners.

Does this PR introduce any user-facing change?

When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process.

How was this patch tested?

Added the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

### What changes were proposed in this pull request? Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49592 from ueshin/issues/SPARK-50909/faulthandler. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Takuya Ueshin <[email protected]>

dongjoon-hyun

Since one of the failures is Python part, could you re-trigger once more, @ueshin ?

dongjoon-hyun · 2025-01-25T05:26:00Z

Oh, according to the log, pyspark.ml.tests.connect.test_parity_torch_data_loader seems to hang consequtively twice on this change. Is there any difference from master branch?

Finished test(python3.11): pyspark.ml.tests.connect.test_parity_regression (26s)
Starting test(python3.11): pyspark.ml.tests.connect.test_parity_torch_data_loader (temp output: /__w/apache-spark/apache-spark/python/target/f869134c-ed13-471f-aa7e-37562cf80415/python3.11__pyspark.ml.tests.connect.test_parity_torch_data_loader__nrhuie5k.log)
Error: The operation was canceled.

ueshin · 2025-01-25T05:33:17Z

@dongjoon-hyun Actually the ml related tests can't pass on the GHA in my repo for some reason, as I also mentioned before.
Do I need any config in my repo? Are there any config changes I missed?

dongjoon-hyun · 2025-01-25T05:52:07Z

Oh, for me, it works fine until now, even in my Today's PR.

[SPARK-50984][SQL][TESTS] Make ExpressionImplUtilsSuite robust by matching JDK msgs via regex #49661

Starting test(python3.11): pyspark.ml.tests.connect.test_parity_torch_data_loader (temp output: /__w/spark/spark/python/target/cb298ec7-7562-4471-8d9a-6dd7d24d68bc/python3.11__pyspark.ml.tests.connect.test_parity_torch_data_loader__n4sh0od6.log)
Finished test(python3.11): pyspark.ml.tests.connect.test_parity_torch_data_loader (116s)

I also don't have a special configuration. It's a simple fork like the other contributors.

Do you happen to know someone who has the same symptom?

ueshin · 2025-01-25T05:57:46Z

Trying on master with an empty commit at #49664.

I remember @HyukjinKwon said it sometimes doesn't pass in his repo as well. sometimes he said vs. always in my repo, though.

HyukjinKwon · 2025-01-25T07:16:05Z

yeah. it's known. I have seen those test failures ONLY in fork repos..

HyukjinKwon · 2025-01-25T07:16:18Z

Let's merge this in first. I will monitor the build.

### What changes were proposed in this pull request? This is a backport of #49592. Setups `faulthandler` in `PythonPlannerRunner`s. It can be enabled by the same config as UDFs. - SQL conf: `spark.sql.execution.pyspark.udf.faulthandler.enabled` - It fallback to Spark conf: `spark.python.worker.faulthandler.enabled` - `False` by default ### Why are the changes needed? The `faulthandler` is not set up in `PythonPlannerRunner`s. ### Does this PR introduce _any_ user-facing change? When enabled, if Python worker crashes, it may generate thread-dump in the error message on the best-effort basis of Python process. ### How was this patch tested? Added the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49635 from ueshin/issues/SPARK-50909/4.0/faulthandler. Lead-authored-by: Takuya Ueshin <[email protected]> Co-authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

ueshin requested review from HyukjinKwon and allisonwang-db January 24, 2025 01:54

github-actions bot added SQL CORE PYTHON labels Jan 24, 2025

HyukjinKwon approved these changes Jan 24, 2025

View reviewed changes

dongjoon-hyun reviewed Jan 24, 2025

View reviewed changes

Test.

3fd202c

HyukjinKwon closed this Jan 25, 2025

ueshin mentioned this pull request Mar 26, 2025

[SPARK-51118][TESTS][FOLLOWUP][4.0] Move test_udf_with_udt to BaseUDFTestsMixin #50395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

Uh oh!

ueshin commented Jan 24, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Jan 25, 2025 •

edited

Loading

Uh oh!

ueshin commented Jan 25, 2025

Uh oh!

dongjoon-hyun commented Jan 25, 2025 •

edited

Loading

Uh oh!

ueshin commented Jan 25, 2025

Uh oh!

HyukjinKwon commented Jan 25, 2025

Uh oh!

HyukjinKwon commented Jan 25, 2025

Uh oh!

Uh oh!

[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

[SPARK-50909][PYTHON][4.0] Setup faulthandler in PythonPlannerRunners #49635

Uh oh!

Conversation

ueshin commented Jan 24, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ueshin commented Jan 25, 2025

Uh oh!

dongjoon-hyun commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ueshin commented Jan 25, 2025

Uh oh!

HyukjinKwon commented Jan 25, 2025

Uh oh!

HyukjinKwon commented Jan 25, 2025

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 25, 2025 •

edited

Loading

dongjoon-hyun commented Jan 25, 2025 •

edited

Loading