-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51185][Core] Revert simplifications to PartitionedFileUtil API to reduce memory requirements #49915
Conversation
cc @gengliangwang from Also, cc @cloud-fan as a release manager of Apache Spark 4.0.0. (Although this PR aims for all live branches, master/branch-4.0/branch-3.5). |
thanks, merging to master/4.0! |
… to reduce memory requirements ### What changes were proposed in this pull request? This PR reverts an earlier change (#41632) that converted FileStatusWithMetadata.getPath from a def to a lazy val in order to simplify the PartitionedFileUtils helpers. ### Why are the changes needed? The conversion of getPath from a def to a lazy val increases the memory requirements because now paths need to be kept in memory as long as the FileStatusWithMetadata exists. As paths are expensive to store, this can lead to higher memory utilization and increase the risk for OOMs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is a small revert to code that has already existed before so the existing tests are sufficient. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49915 from LukasRupprecht/def_get-path. Authored-by: Lukas Rupprecht <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 74d88b6) Signed-off-by: Wenchen Fan <[email protected]>
it conflicts with 3.5, @LukasRupprecht can you open a new 3.5 PR? thanks! |
Late +1 |
Thanks @cloud-fan for merging this! Will prepare a separate PR for 3.5. |
@cloud-fan @dongjoon-hyun Here is the 3.5 version of this PR: #49995. |
…l API to reduce memory requirements ### What changes were proposed in this pull request? This PR reverts an earlier change (#41632) that converted FileStatusWithMetadata.getPath from a def to a lazy val in order to simplify the PartitionedFileUtils helpers. This is the 3.5 PR. The main PR for 4.0 is #49915. ### Why are the changes needed? The conversion of getPath from a def to a lazy val increases the memory requirements because now paths need to be kept in memory as long as the FileStatusWithMetadata exists. As paths are expensive to store, this can lead to higher memory utilization and increase the risk for OOMs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is a small revert to code that has already existed before so the existing tests are sufficient. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49995 from LukasRupprecht/def_get-path_3.5. Authored-by: Lukas Rupprecht <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR reverts an earlier change (#41632) that converted FileStatusWithMetadata.getPath from a def to a lazy val in order to simplify the PartitionedFileUtils helpers.
Why are the changes needed?
The conversion of getPath from a def to a lazy val increases the memory requirements because now paths need to be kept in memory as long as the FileStatusWithMetadata exists. As paths are expensive to store, this can lead to higher memory utilization and increase the risk for OOMs.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This is a small revert to code that has already existed before so the existing tests are sufficient.
Was this patch authored or co-authored using generative AI tooling?
No