You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51112][CONNECT] Avoid using pyarrow's to_pandas on an empty table
### What changes were proposed in this pull request?
When the `pyarrow` table is empty, avoid calling the `to_pandas` method due to potential segfault failures. Instead, an empty pandas dataframe is created manually.
### Why are the changes needed?
Consider the following code:
```python
from pyspark.sql.types import StructField, ArrayType, StringType, StructType, IntegerType
import faulthandler
faulthandler.enable()
spark = SparkSession.builder \
.remote("sc://localhost:15002") \
.getOrCreate()
sp_df = spark.createDataFrame(
data = [],
schema=StructType(
[
StructField(
name='b_int',
dataType=IntegerType(),
nullable=False,
),
StructField(
name='b',
dataType=ArrayType(ArrayType(StringType(), True), True),
nullable=True,
),
]
)
)
print(sp_df)
print('Spark dataframe generated.')
print(sp_df.toPandas())
print('Pandas dataframe generated.')
```
Executing this may lead to a segfault when the line `sp_df.toPandas()` is run.
Example:
```
Thread 0x00000001f1904f40 (most recent call first):
File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 808 in table_to_dataframe
File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyspark/sql/connect/client/core.py", line 949 in to_pandas
File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyspark/sql/connect/dataframe.py", line 1857 in toPandas
File "<python-input-3>", line 1 in <module>
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/code.py", line 92 in runcode
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/console.py", line 205 in runsource
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/code.py", line 313 in push
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/simple_interact.py", line 160 in run_multiline_interactive_console
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/main.py", line 59 in interactive_console
File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/__main__.py", line 6 in <module>
File "<frozen runpy>", line 88 in _run_code
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit test.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#49834 from vicennial/SPARK-51112.
Lead-authored-by: vicennial <[email protected]>
Co-authored-by: Venkata Sai Akhil Gudesa <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 9d88020)
Signed-off-by: Hyukjin Kwon <[email protected]>
0 commit comments