-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4608 - performance modify rate: prefetch target entry before ho… #4610
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, with each call to this prefetcher, do we only do the prefect once per search/lookup? That way we don't have contention on the lists being "fought" over by two threads alternating and continually trying to move an item to the head.
ldap/servers/slapd/back-ldbm/cache.c
Outdated
* the same entry | ||
*/ | ||
int | ||
find_hash_deluxe(Hashtable *ht, const void *key, uint32_t keylen, void **entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be "find_hash_prefetch" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch calls prefetch only on MOD (It this could be extended to all write operation). For searches, I am not sure of the benefit of prefetch as the backend is not locked.
I will rename it 'find_hash_move_front' (Initially I had no idea how to name it :)). I have not instrumented cache to get numbers if this move to the front brings a significant benefit. IMHO the benefit should be low. Indeed if the entry cache is large (e.g. 1G) the number of hash collision (entries with the same hash) should be low (1M slots). But the cost to move it to the front is minor compare of the potential benefit.
If several threads are in competition on the same slot, the benefit is to not let the target entry in the middle of the list instead moving it to the front or close to the front.
I will do additional tests with high cache pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news is that the benefit of the patch goes from +1% when there is no pressure on the entry cache (all entries fit in the cache) to +2% when there are pressure. I think with smaller entry cache size => #slot decrease => more entries per slot => more benefit of the moving the fetched entries to the front.
The bad news is that I observed unexpected results regarding the cache size. With or without the patch, the throughput is 7%-8% higher if there are pressure on the entry cache !
If the entry cache fits 15% of the entries (pressure), throughput is 3700/s
If then entry cache fits 100% of the entries, (no pressure) throughput is 3400/s
I have no explanation for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really interesting! I'm having a look now to try and understand, but I must admit I'm only reading the code.
I think the intent of your patch here is really good though, but perhaps there is an easier method to achieve this? The intent I think is to keep 'hot' items at the head of bucket slots, but there may be some easier ways to achieve it.
The first is that it appears the way we set the hashsize in new_hash we attempt to move to a prime number. However, this isn't actually going to help because when we are looking at our hashes we are always looking them up based on modulo arithmatic. This means that because the size is moved to a prime rather than a power of two, we will end up with a tendancy for some slots to actually be loaded more than some others. As a small example, imagine we set the size to "7".
[ 0 | 1 | 2 | 3 | 4 | 5 | 6 ]
>>> stats = [ x % 7 for x in range(0,8192)]
Counter({0: 1171, 1: 1171, 2: 1170, 3: 1170, 4: 1170, 5: 1170, 6: 1170})
>>> stats = [ x % 8 for x in range(0,8192)]
Counter({0: 1024, 1: 1024, 2: 1024, 3: 1024, 4: 1024, 5: 1024, 6: 1024, 7: 1024})
It's very very slight, but we can see that slot 0 and 1 will have a slight chance more to be used. Saying this, the slots setup to a power of 2 have an equal likelihood, and just by having "more of them" less likely to cause a collision. Most external hash map implementations determine the number of "buckets" as a power of two. This for example is how the buckets/slots are determined in the rust lang hashmap which is based on swisstable from google.
https://github.com/Amanieu/hashbrown/blob/master/src/raw/mod.rs#L178
The real interesting question though is our hashing functions. It appears in c_idtable
we actually use the _key as the "hash", This probably isn't the "worst" but it may not equally load buckets depending on other cache configuration elements.
It's the dn_hash and uuid_hash that strike me though, as seem to be a custom hash algo, so it's hard to know if these actually have a proper distribution throughout a number range and this could cause certain buckets to be longer.
It may be worth:
- Rounding up "size" to a power of two, and improving the "load factor" by potentially doing a similar calculation to what hashbrown is doing given it's proven performance to promote small bucket sizes.
- Using siphash13 instead of custom hash functions. An example of this is in the ndn cache. Note that siphash hash cache attack resistance through it's key, and is a 64bit hash instead of 32bit.
- Testing if using siphash13 on the id rather than using the raw key is more effective at keeping the hash distributed equally.
…lding backend lock Bug description: During a MOD, the target entry is fetch from the entry cache while holding the backend lock. If the entry needs to be upload from the DB it increase the duration of the critical section. Fix description: Before holding the backend lock, pre-fetch the entry in the entry cache. In addition the pre-fetch move the entry at the beginning of the hash slot, that reduce the number of tests when the entry will be fetch in the critical section. relates: 389ds#4608 Reviewed by: William Brown Platforms tested: RHEL 8.3, F31
Thanks to the CI tests, this patch triggers crashes... need to rework it |
…lding backend lock
Bug description:
During a MOD, the target entry is fetch from the entry cache
while holding the backend lock.
If the entry needs to be upload from the DB it increase
the duration of the critical section.
Fix description:
Before holding the backend lock, pre-fetch the entry in the
entry cache. In addition the pre-fetch move the entry
at the beginning of the hash slot, that reduce the
number of tests when the entry will be fetch in the critical
section.
relates: #4608
Reviewed by:
Platforms tested: RHEL 8.3, F31