Question regarding potential improvement in client side buffer reuse #3036
Unanswered
saurabhd336
asked this question in
Q&A
Replies: 1 comment
-
For partition buffers that don't need a merge (i.e. no pending records), the buffer gets returned as is to pool and gets reused. Can't seem to find a reason why the buffer being merged should need to be cleared. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm very new to celeborn and had been going through the codebase to understand the data push + read flows. I had a question regarding the buffer reuse in HashBasedShuffleWriter.
Here https://github.com/apache/celeborn/blob/main/client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/HashBasedShuffleWriter.java#L358, it seems like we're freeing the buffer post mergeData while closing the writer.
I was curious if we'd get better reusability of buffers if we didn't free the buffer here and simply returned the buffers as is to the pool? Since it's likely that almost all partitions would have some residual data to be merged, it'd mean we are potentially unnecessarily GC'ing many buffers which could potentially be reused as is by other writers? Or am I missing something here due to which it's necessary to clear the buffer here? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions