Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51126][PS][DOCS] Optimize the memory usage in kde examples #49842

Closed
wants to merge 2 commits into from

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Optimize the memory usage in KDE examples, the memory usage for KDE computation should be about 10% X previous value.

Why are the changes needed?

4b7191d computes all metrics for all curves together, so only need one pass on the dataset.
however, it increase the memory usage, so fail the dynamic plot generation.

Does this PR introduce any user-facing change?

yes, minor quality loss

before:
image

after:
image

How was this patch tested?

manually test, with bin/pyspark, the example fails with OOM before

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Feb 7, 2025
zhengruifeng added a commit that referenced this pull request Feb 7, 2025
### What changes were proposed in this pull request?
Optimize the memory usage in KDE examples, the memory usage for KDE computation should be about 10% X previous value.

### Why are the changes needed?
4b7191d computes all metrics for all curves together, so only need one pass on the dataset.
however, it increase the memory usage, so fail the dynamic plot generation.

### Does this PR introduce _any_ user-facing change?
yes, minor quality loss

before:
![image](https://github.com/user-attachments/assets/84284a05-6469-45f8-8246-7718e1235aed)

after:
![image](https://github.com/user-attachments/assets/d18ad03d-d4fa-454c-beb4-db12af0c209a)

### How was this patch tested?
manually test, with `bin/pyspark`, the example fails with OOM before

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #49842 from zhengruifeng/ps_kde_memory_opt.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 7a1dcb3)
Signed-off-by: Ruifeng Zheng <[email protected]>
@zhengruifeng
Copy link
Contributor Author

linter and doc gen passed

meregd to master/4.0

@zhengruifeng zhengruifeng deleted the ps_kde_memory_opt branch February 7, 2025 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants