-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce lock contention #5
base: master
Are you sure you want to change the base?
Conversation
725cd5c
to
f841b5c
Compare
f841b5c
to
0eacfce
Compare
Keeping binaries as binary references is probably important |
Having each cache be its own Resource that we could track the reference to, and not having the NIF track the list of Resources would also be fine, FWIW. We could track the name -> Resource mapping in the Erlang part of the NIF to retain API compat. |
Yes, that would be ideal. EDIT: on second thought, I'm not sure if doing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the caveat that I don't know Rust... The concept behind the changes seems sound to me. My only concern was possible exclusive lock starvation, but it looks like the RwLock implementation favors write locks to avoid this (once a write lock is requested, no read locks are granted until after the write lock is granted).
native/lib.rs
Outdated
.expect("MinQ1Size/MaxSize ratio doesn't appear to make sense"); | ||
let cache = Arc::new(RwLock::new(TermCache::new(inner))); | ||
CACHES | ||
.write() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trying to get a write
lock on CACHES
without releasing the read
lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks like it's creating a new cache for put
if one doesn't exist. The code here differs slightly from the code for create
. Can the cache craeting get moved to a utility function that both create
and put
use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trying to get a write lock on CACHES without releasing the read lock?
It may not be obvious, but the read lock acquired on is a temporary and dropped at the end of line 65. Elsewhere in the code I explicitly drop read locks instead of waiting for them to drop at the end of scope:
drop(some_lock);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the cache craeting get moved to a utility function that both create and put use?
Probably. Maybe there's a valid reason I didn't do that, but it's likely an oversight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be obvious, but the read lock acquired on is a temporary and dropped at the end of line 65
Got it. The lock gets dropped when the value goes out of scope. This is my non-knowledge of Rust getting in the way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: Marc's intuition is correct and there is in fact a deadlock here. It is fixed with a recent commit.
You could use persistent term for the mapping and some code on the Erlang side to map the atom cache name to a NIF reference to pass to the Rust NIF. If the value is a NIF reference, then there's no global GC when the key is deleted (say to delete the cache). This would require changing the Rust code to use a NIF ref to somehow point to a particular cache. That way, we don't need a hashmap in the Rust code to dereference the atom identifier for the cache. |
Drop live readlocks before creating writelocks.
let cache_name = HBin::from_atom(env, cache_name); | ||
if caches.get(&cache_name).is_some() { | ||
let caches = CACHES.read().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think grabbing the read lock, dropping the lock, and then acquiring the write lock introduces a race condition where two threads can try to insert the same cache at the same time. There's probably not real consequence to that for create. Maybe use an atomic read -> write upgradable lock or start with a write lock.
// deadlock when creating the upcoming write lock. | ||
drop(caches); | ||
let mut caches = CACHES.write().unwrap(); | ||
caches.insert(HBin::from_atom(env, cache_name), cache.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's the possibility of a race condition here where the put
first discovers there is no cache under the read lock, drops the lock, and another thread in put
or create
adds the cache. The insert
here that happens next erases the previous cache, possibly causing the loss of a cached item. Since the cache is an optimization, this isn't a correctness problem but it could be a minor performance problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in create
an upgradable lock or starting with a write lock might be better.
The PR improves concurrency with respect to the number of caches:
However, the branch scalability with respect to workers per cache is poorer than |
Benchmark test is in this branch |
Prototype to use mutex but yield if the lock is taken scales pretty well to around four concurrent caches/workers. Prototype code in
|
Prototype to use a Mutex instead of a RwLock on the cache (not the cache map) on this PR improves the performance of the PR:
|
https://github.com/marcsugiyama accidentally closed the branch. Reopenning. |
Further prototypes to improve the performance further:
Even with these changes, it's not clear if the scalability of e2qc will match cream under high load. cream has no locks to lookup the cache id and is based on the highly concurrent moka cache. Using an RwLock for the cache id lookup in e2qc likely eliminates blocking on this lookup as writes locks are only needed when a cache is created or destroyed, and caches are likely created during system initialization. An API change for e2qc can eliminate the cache id lookup lock, but we'd still be left with an exclusive lock on the cache under most circumstances. A yield if the cache is locked avoids blocking in Rust, but we suffer a round trip through Erlang scheduling. Whether cream is better than e2qc depends on the use case. miner seems to have several different caches, which makes cream better than the version of e2qc with a single mutex for all caches. It's not clear now many workers there are per cache. Scalability of cream seems to be better than e2qc with respect to the number of workers per cache. Depending on the usage pattern, 2qc may not be the right choice anyway. 2qc is designed for a specific relational database pattern which is trying to balance caching small tables/index pages against churning through buffers in a table scan. e2qc needs an exclusive lock on most cache operations because the LRU chain is updated. cream uses moka which implements synchronization within the library so it's behavior is not obvious. moka documentation suggests it behaves like a traditional LRU cache. |
I put no effort to into optimizing the synchronization of multiple caches when first re-writing the native the portion of this module. Currently, all individual caches are stored in a single global map synchronized with a mutex. That mutex is locked for as long as any NIF is accessing any single cache, thus blocking any other NIFs wanting to access any other unrelated cache.
This PR attempts to reduce global lock contention by switching to a read-write lock instead of a mutex. The idea behind this is that new caches will be created infrequently, and most lookups of individual caches in the global cache-map will read only. Furthermore, each cache in the map is wrapped in an atomically reference counted (
Arc
) wrapper and read-write lock (RwLock
). This allows us to hold on to the global lock only as long as needed to clone the target cache's ARC wrapper. Further more, non-mutating access to each cache only reaquires a read lock, thus allowing other NIF calls concurrent read-access to the same cache.Outstanding work
This code will not compile as-is because the
OwnedBinary
s used for both the global map keys and cache keys are notSync
. I need to either A) get upstream rustler toimply Sync for OwnedBinary
or B) switch to usingVec<u8>
for key-value pairs. (B) may cause additional data copy overhead, but I'm a little fuzzy on the overhead converting between ErlNIfBinary and Binary terms.