-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace HashSet with ConcurrentHashMap.newKeySet #3100
base: main
Are you sure you want to change the base?
Conversation
…etween handleGetReducerFileGroup & tryFinalCommit
@aidar-stripe Hi, PR #2986 "[CELEBORN-1769] Fix packed partition location cause GetReducerFileGroupResponse lose location" fixed the scenario you might have encountered. Is the PR in your distribution or can you provide your worker's distribution commit ID? |
@FMX thanks for the link! I think you are absolutely right here, we were running a version of Celeborn client (it's been 0.5.1 with some of our commits for integrity checks, which were disabled). I could confirm that PbGetReducerFileGroupResponse conversion code only takes primaries there:
This explains consistency of the failures that we've seen much better than the potential concurrency issue with the HashSet. I would still like to merge in the PR though, I think usage of ConcurrentHashSet still more appropriate there. |
@aidar-stripe You can refer to the code celeborn/client/src/main/scala/org/apache/celeborn/client/commit/ReducePartitionCommitHandler.scala Lines 154 to 166 in f094821
You will discover that only one thread will write to the partition location set at a time. Here is no concurrent issue. |
That's correct synchronization on
I think the closest analogy here would be this particular example: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-volatiles-wrong. I don't think that there's a huge risk of this happening - for it to happen I'm happy with leaving this as is, since we've added some additional integrity checks on our side. But it feels like changing HashSet to concurrent version should be relatively cheap, especially considering that all the rest of the structures are concurrent. Regardless of the decision - thanks for responding and reviewing! |
What changes were proposed in this pull request?
Replacing HashSet of PartitionLocations with concurrent version of it.
Why are the changes needed?
We are seeing some race conditions between
handleGetReducerFileGroup
&tryFinalCommit
, where reducers complete without processing partition, even though there's data.Problematic logs
On the driver side:
On the executor side:
How was this patch tested?
No additional tests for this: I've tried to reproduce it, but we've only seen this happen with high number of nodes and during long execution time range.
More explanation on why/how this happens
Since concurrency guarantees between read/write path are based on ConcurrentHashMap's volatile values there's no guarantee that content of a HashSet would be seen fully by the reader thread.