Skip to content

Improve numberOfObjectsSinceBitmap #150

Open
@syntonyze

Description

@syntonyze

Description

Change [1] introduced numberOfObjectsSinceBitmap to RepoStatistics.

This number contains the number ofobjects stored in pack files and as loose objects created since thelatest bitmap generation.

Change [2] fixed its calculation by stop counting as soon as the most recent bitmap is encountered.
However, I've identified a case where the way the "number of objects since bitmap" metric may be inaccurate or misleading when GC happens.

The Problem

When Garbage Collection (GC) occurs, the non-heads packfile can be written slightly later than the heads packfile. This small timing difference can lead to incorrect accounting of objects.

Observed Behavior

In my local environment, I noticed the following:

Heads Packfile (with bitmap)

-r--r--r--@   1 syntonyze  staff  230648 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.pack
-r--r--r--@   1 syntonyze  staff   12216 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.idx
-r--r--r--@   1 syntonyze  staff    4862 Mar  7 21:08 pack-315d970458e6b4abf8f4e76373f2c88fbc225ca3.bitmap

Non-Heads Packfile (without bitmap)

-r--r--r--@   1 syntonyze  staff  814553 Mar  7 21:08 pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.pack
-r--r--r--@   1 syntonyze  staff   62728 Mar  7 21:08 pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.idx

At first glance, both packfiles appear to have the same timestamp (Mar 7, 21:08). However, upon closer inspection, the modification times (mtime) show a slight difference at the nanosecond level:

pack-0c4e73bf93d64177b8233d007c3ea29cf95ab5f2.pack (without bitmap) -> mtime: 2025-03-07T19:48:37.366075759Z
pack.315d970458e6b4abf8f4e76373f2c88fbc225ca3.pack (with bitmap) -> mtime: 2025-03-07T19:48:37.254571758Z

Since the non-heads packfile was written slightly later, it gets counted in the “number of objects since bitmap” metric.
This means the metric never resets to zero and always includes all objects from the non-heads packfile.
However, since both packfiles are effectively created at the same time, this behavior leads to inaccurate results.

Image

I believe the objects in the non-heads packfile should not contribute to the "number of objects since bitmap" metric, as they are part of the same GC cycle. This would provide a more accurate representation of newly added objects.

The proposed solution is to add an additional metric to only count for the number of objects since the bitmap creation that are either loose or belong to "heads" packfiles.

[1] https://eclipse.gerrithub.io/c/eclipse-jgit/jgit/+/1203398
[2] https://review.gerrithub.io/c/eclipse-jgit/jgit/+/1208601

Motivation

I believe the objects in the non-heads packfile should not contribute to the "number of objects since bitmap" metric, as they are part of the same GC cycle. This would provide a more accurate representation of newly added objects.

Alternatives considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions