-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51185][Core][3.5] Revert simplifications to PartitionedFileUtil API to reduce memory requirements #49995
Conversation
Compilation failed, probably due to 3.5 using scala 2.12 |
c0a3cb5
to
b0bbad1
Compare
Hmm, looks like the build failures are in
which this PR is not touching. I just rebased and checks are running again. |
Hmm, looks like it's still failing compilation with the same error. @cloud-fan Do you have an idea why the build would fail with a supposedly unrelated error? |
oh it's broken by other commits and #50008 is fixing it |
@LukasRupprecht can you rebase your PR and try again? The issue should have been resolved. |
Thank you for making a PR, @LukasRupprecht . Is this target for Apache Spark 3.5.5, @HyukjinKwon and @cloud-fan ? |
b0bbad1
to
09b5075
Compare
@dongjoon-hyun yes it is, let me merge it now, thanks all! |
…l API to reduce memory requirements ### What changes were proposed in this pull request? This PR reverts an earlier change (#41632) that converted FileStatusWithMetadata.getPath from a def to a lazy val in order to simplify the PartitionedFileUtils helpers. This is the 3.5 PR. The main PR for 4.0 is #49915. ### Why are the changes needed? The conversion of getPath from a def to a lazy val increases the memory requirements because now paths need to be kept in memory as long as the FileStatusWithMetadata exists. As paths are expensive to store, this can lead to higher memory utilization and increase the risk for OOMs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is a small revert to code that has already existed before so the existing tests are sufficient. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49995 from LukasRupprecht/def_get-path_3.5. Authored-by: Lukas Rupprecht <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Got it. Thank you, @cloud-fan . |
What changes were proposed in this pull request?
This PR reverts an earlier change (#41632) that converted FileStatusWithMetadata.getPath from a def to a lazy val in order to simplify the PartitionedFileUtils helpers.
This is the 3.5 PR. The main PR for 4.0 is #49915.
Why are the changes needed?
The conversion of getPath from a def to a lazy val increases the memory requirements because now paths need to be kept in memory as long as the FileStatusWithMetadata exists. As paths are expensive to store, this can lead to higher memory utilization and increase the risk for OOMs.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This is a small revert to code that has already existed before so the existing tests are sufficient.
Was this patch authored or co-authored using generative AI tooling?
No