LMDB Cache: partially reuse GCed IDs so to save cycles on cleaning LMDB #17469

glyh · 2025-07-04T09:43:21Z

As title. This is a follow up of using hashset to track GCed IDs.

NOTE: there's still IDs that can't be reused, if there's too many garbages( >= garbage_size_limit) appearing at the same time.

glyh · 2025-07-04T09:46:24Z

BTW, somehow I have designed similar structure in an earlier version of Work Partitioner's ID generator. I don't know if there's value factor out some common patterns.

Probably for now it's not needed. If we run into these multiple time we'll consider that.

georgeee · 2025-07-04T12:21:15Z

src/lib/disk_cache/lmdb/disk_cache.ml

+      | Some reused_id ->
+          [%log debug] "Reusing LMDB key %d for a new KV pair" reused_id
+            ~metadata:[ ("index", `Int reused_id) ] ;
+          Hash_set.remove garbage reused_id ;


Suggested change

Hash_set.remove garbage reused_id ;

in some code below we'll overwrite the value anyway:

Rw.set ~env db idx x ;

I think being inside the hashset and being cleared from LMDB cache has different semantic. If we don't remove them from the garbage hashset here. It will be reassigned by some other invocations to put, or being erased when there's too many garbages.

Yes, I think you're right @glyh. This is Hash_set.remove here and not Rw.remove - the reused ID needs to be removed from the hash set tracking if we're going to reuse it.

glyh · 2025-07-07T23:53:48Z

I'll force push later to adapt to earlier PR forced pushed in the train

glyh · 2025-07-08T03:04:43Z

rebased

dannywillems · 2025-07-09T06:37:10Z

src/lib/disk_cache/lmdb/disk_cache.ml

+      | None ->
+          incr counter ; !counter - 1
+      | Some reused_id ->
+          [%log spam] "Reusing LMDB key %d for a new KV pair" reused_id


Can we remove this log please?

dannywillems · 2025-07-09T07:07:39Z

!ci-build-me

dannywillems · 2025-07-09T07:07:57Z

!ci-bypass-changelog

glyh · 2025-07-09T07:12:00Z

there's likely a merge conflict. I'll rebase later

glyh · 2025-07-09T07:56:53Z

rebased so there's no merge conflicts

glyh · 2025-07-09T07:56:59Z

!ci-build-me

glyh · 2025-07-09T08:00:42Z

!ci-build-me

dannywillems · 2025-07-10T09:33:49Z

src/lib/disk_cache/lmdb/disk_cache.ml

-    let idx = !counter in
-    incr counter ;
+    let idx =
+      match Hash_set.find garbage ~f:(const true) with


I don't understand what you are trying to achieve by using ~f:(const true).
This will always return either a random element (or the first, I don't remember if find on Hashtbl is deterministic) if the set is empty (which is the case until the GC has been executed at least once on a value) or it will increase the counter

We should ensure that we reached the garbage size limit, otherwise, it will be potentially re-using some index whose values have not been removed yet.

selecting a random element from hashset and use it as new key is exactly the expected behavior. Any key in that set is abandoned

selecting a random element from hashset and use it as new key is exactly the expected behavior. Any key in that set is abandoned

That also seems correct to me. If an int is in the set then the associated id was judged unreachable and finalized by the GC, so there shouldn't be any live objects still using the id and we're safe to reuse the int.

I think the rationale for the PR is that whereas before we would go through a cycle where we accumulate unused indices until we hit garbage_size_limit and then do a full cleanup of the lmdb cache, now we avoid some of those cleanups by reusing cache slots.

glyh requested a review from a team as a code owner July 4, 2025 09:43

georgeee requested changes Jul 4, 2025

View reviewed changes

glyh added the enhancement Not big enough to be a feature, but is a smaller improvement label Jul 7, 2025

glyh force-pushed the lyh/fix-lmdb-deadlock branch from 00f7243 to 97a1a48 Compare July 7, 2025 12:26

glyh force-pushed the lyh/reuse-lmdb-keys branch from 97a341e to eaa1814 Compare July 7, 2025 13:06

dannywillems force-pushed the lyh/fix-lmdb-deadlock branch from c841ecc to 8a787ef Compare July 7, 2025 14:16

Base automatically changed from lyh/fix-lmdb-deadlock to compatible July 7, 2025 17:01

glyh force-pushed the lyh/reuse-lmdb-keys branch from ed6a7c7 to 5af9c08 Compare July 8, 2025 03:04

glyh closed this Jul 9, 2025

glyh reopened this Jul 9, 2025

glyh force-pushed the lyh/reuse-lmdb-keys branch from 5af9c08 to 4abc6bc Compare July 9, 2025 02:22

dannywillems reviewed Jul 9, 2025

View reviewed changes

glyh force-pushed the lyh/reuse-lmdb-keys branch from b4fb6eb to 3e5b7b4 Compare July 9, 2025 07:56

LMDB Cache: partially reuse GCed IDs so to save cycles on cleaning LMDB

5174b79

glyh force-pushed the lyh/reuse-lmdb-keys branch from 3e5b7b4 to 5174b79 Compare July 9, 2025 08:00

dannywillems reviewed Jul 10, 2025

View reviewed changes

glyh mentioned this pull request Jul 10, 2025

LMDB Cache: partially reuse GCed IDs so to save cycles on cleaning LMDB V2 #17514

Merged

glyh closed this Jul 10, 2025

LMDB Cache: partially reuse GCed IDs so to save cycles on cleaning LMDB #17469

LMDB Cache: partially reuse GCed IDs so to save cycles on cleaning LMDB #17469

Uh oh!

Conversation

glyh commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glyh commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgeee Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjjdespres Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

glyh commented Jul 7, 2025

Uh oh!

glyh commented Jul 8, 2025

Uh oh!

dannywillems Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

dannywillems commented Jul 9, 2025

Uh oh!

dannywillems commented Jul 9, 2025

Uh oh!

glyh commented Jul 9, 2025

Uh oh!

glyh commented Jul 9, 2025

Uh oh!

glyh commented Jul 9, 2025

Uh oh!

glyh commented Jul 9, 2025

Uh oh!

dannywillems Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

dannywillems Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glyh Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

cjjdespres Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glyh commented Jul 4, 2025 •

edited

Loading

glyh commented Jul 4, 2025 •

edited

Loading

glyh Jul 4, 2025 •

edited

Loading

dannywillems Jul 10, 2025 •

edited

Loading