Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51087][PYTHON][CONNECT] Raise a warning when memory-profiler is not installed for memory profiling #49797

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Feb 5, 2025

What changes were proposed in this pull request?

Raise a warning when memory-profiler is not installed for memory profiling.

Why are the changes needed?

Better usability of PySpark UDF memory profiling.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing and manual tests as shown below:

>>> import memory_profiler  # when memory-profiler is not installed
Traceback (most recent call last):
...
ModuleNotFoundError: No module named 'memory_profiler'

>>> from pyspark.sql.functions import pandas_udf
>>> 
>>> df = spark.range(10)
>>> @pandas_udf("long")
... def add1(x):
...   return x + 1
... 
>>> added = df.select(add1("id"))
>>> 
>>> spark.conf.set("spark.sql.pyspark.udf.profiler", "memory")
>>> added.show()
+--------+                                                                      
|add1(id)|
+--------+
|       1|
|       2|
|       3|
|       4|
|       5|
|       6|
|       7|
|       8|
|       9|
|      10|
+--------+
>>> spark.profile.show(type="memory")
/Users/xinrong.meng/spark/python/pyspark/sql/profiler.py:141: UserWarning: Install the 'memory_profiler' library in the cluster to enable memory profiling
...
>>> spark.profile.dump(path='...', type="memory")
/Users/xinrong.meng/spark/python/pyspark/sql/profiler.py:225: UserWarning: Install the 'memory_profiler' library in the cluster to enable memory profiling
...

Was this patch authored or co-authored using generative AI tooling?

No.

@ueshin
Copy link
Member

ueshin commented Feb 5, 2025

What if the server has memory-profiler, but the client doesn't, or vice versa?
This seems to check the client side only, but actually I don't think we need the library in the client side to show the result?

@xinrong-meng xinrong-meng force-pushed the memory_profiler_install branch from a9934f9 to 6a449a9 Compare February 5, 2025 02:31
@xinrong-meng xinrong-meng changed the title [SPARK-51087][PYTHON][CONNECT] Raise an error when memory-profiler is not installed for memory profiling [SPARK-51087][PYTHON][CONNECT] Raise a warning when memory-profiler is not installed for memory profiling Feb 5, 2025
@github-actions github-actions bot added CORE and removed SQL labels Feb 5, 2025
@xinrong-meng
Copy link
Member Author

Good point, how do you like the current approach? @ueshin

@ueshin
Copy link
Member

ueshin commented Feb 5, 2025

The worker side warning message won't be seen from the client.
How about showing warning message in the client side when:

  • The client doesn't have the library installed
  • There are no data to show when calling spark.profile.show or ..dump
    • because the server side may have the library and already measured.

to make sure the library is installed?

@xinrong-meng xinrong-meng force-pushed the memory_profiler_install branch from 3349a6a to 7cf4d14 Compare February 6, 2025 23:48
@github-actions github-actions bot added SQL and removed CORE labels Feb 6, 2025
@xinrong-meng
Copy link
Member Author

Thank you @ueshin ! Now it raises a warning when the client/driver does not have memory-profiler and there is no result profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants