Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve numberOfObjectsSinceBitmap #150

Open
syntonyze opened this issue Mar 13, 2025 · 0 comments
Open

Improve numberOfObjectsSinceBitmap #150

syntonyze opened this issue Mar 13, 2025 · 0 comments

Comments

@syntonyze
Copy link
Contributor

syntonyze commented Mar 13, 2025

Description

Change [1] introduced numberOfObjectsSinceBitmap to RepoStatistics.

This number contains the number ofobjects stored in pack files and as loose objects created since thelatest bitmap generation.

Change [2] fixed its calculation by stop counting as soon as the most recent bitmap is encountered.
However, I've identified a case where the way the "number of objects since bitmap" metric may be inaccurate or misleading when GC happens.

The Problem

When Garbage Collection (GC) occurs, the non-heads packfile can be written slightly later than the heads packfile. This small timing difference can lead to incorrect accounting of objects.

Observed Behavior

In my local environment, I noticed the following:

Heads Packfile (with bitmap)

-r--r--r--@   1 syntonyze  staff  230648 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.pack
-r--r--r--@   1 syntonyze  staff   12216 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.idx
-r--r--r--@   1 syntonyze  staff    4862 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.bitmap

Non-Heads Packfile (without bitmap)

-r--r--r--@   1 syntonyze  staff  814553 Mar  7 21:08 pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.pack
-r--r--r--@   1 syntonyze  staff   62728 Mar  7 21:08 pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.idx

At first glance, both packfiles appear to have the same timestamp (Mar 7, 21:08). However, upon closer inspection, the modification times (mtime) show a slight difference at the nanosecond level:

pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.pack (without bitmap) -> mtime: 2025-03-07T19:48:37.366075759Z
pack.315d970458e6b4abf8f4e76373f2c88fbc225ca3.pack (with bitmap) -> mtime: 2025-03-07T19:48:37.254571758Z

Since the non-heads packfile was written slightly later, it gets counted in the “number of objects since bitmap” metric.
This means the metric never resets to zero and always includes all objects from the non-heads packfile.
However, since both packfiles are effectively created at the same time, this behavior leads to inaccurate results.

Image

I believe the objects in the non-heads packfile should not contribute to the "number of objects since bitmap" metric, as they are part of the same GC cycle. This would provide a more accurate representation of newly added objects.

The proposed solution is to add an additional metric to only count for the number of objects since the bitmap creation that are either loose or belong to "heads" packfiles.

[1] https://eclipse.gerrithub.io/c/eclipse-jgit/jgit/+/1203398
[2] https://review.gerrithub.io/c/eclipse-jgit/jgit/+/1208601

Motivation

I believe the objects in the non-heads packfile should not contribute to the "number of objects since bitmap" metric, as they are part of the same GC cycle. This would provide a more accurate representation of newly added objects.

Alternatives considered

No response

Additional context

No response

@syntonyze syntonyze changed the title Improve Improve numberOfObjectsSinceBitmap Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant