-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51087][PYTHON][CONNECT] Raise a warning when memory-profiler is not installed for memory profiling #49797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What if the server has memory-profiler, but the client doesn't, or vice versa? |
a9934f9
to
6a449a9
Compare
Good point, how do you like the current approach? @ueshin |
The worker side warning message won't be seen from the client.
to make sure the library is installed? |
3349a6a
to
7cf4d14
Compare
Thank you @ueshin ! Now it raises a warning when the client/driver does not have memory-profiler and there is no result profile |
…s not installed for memory profiling ### What changes were proposed in this pull request? Raise a warning when memory-profiler is not installed for memory profiling. ### Why are the changes needed? Better usability of PySpark UDF memory profiling. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing and manual tests as shown below: ```py >>> import memory_profiler # when memory-profiler is not installed Traceback (most recent call last): ... ModuleNotFoundError: No module named 'memory_profiler' >>> from pyspark.sql.functions import pandas_udf >>> >>> df = spark.range(10) >>> pandas_udf("long") ... def add1(x): ... return x + 1 ... >>> added = df.select(add1("id")) >>> >>> spark.conf.set("spark.sql.pyspark.udf.profiler", "memory") >>> added.show() +--------+ |add1(id)| +--------+ | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| | 10| +--------+ >>> spark.profile.show(type="memory") /Users/xinrong.meng/spark/python/pyspark/sql/profiler.py:141: UserWarning: Install the 'memory_profiler' library in the cluster to enable memory profiling ... >>> spark.profile.dump(path='...', type="memory") /Users/xinrong.meng/spark/python/pyspark/sql/profiler.py:225: UserWarning: Install the 'memory_profiler' library in the cluster to enable memory profiling ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49797 from xinrong-meng/memory_profiler_install. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Xinrong Meng <[email protected]> (cherry picked from commit df89c8e) Signed-off-by: Xinrong Meng <[email protected]>
Merged to master and branch-4.0 thanks! |
What changes were proposed in this pull request?
Raise a warning when memory-profiler is not installed for memory profiling.
Why are the changes needed?
Better usability of PySpark UDF memory profiling.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing and manual tests as shown below:
Was this patch authored or co-authored using generative AI tooling?
No.