Skip to content

UPSTREAM PR #2547: Improve performance of status on windows#52

Open
loci-dev wants to merge 2 commits into
mainfrom
loci/pr-2547-windows-status-performance
Open

UPSTREAM PR #2547: Improve performance of status on windows#52
loci-dev wants to merge 2 commits into
mainfrom
loci/pr-2547-windows-status-performance

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: GitoxideLabs/gitoxide#2547

This creates a cache of file metadata that is then prefilled by windows API calls to allow per-directory walking instead of per file. As a result performance is much faster.

The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.

Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.


Additional things to consider and discuss perhaps:

  1. This does have a little drift I feel, the cache works but perhaps it should not be considered a cache since its thrown away after every git status, and often invalidating these is equivalent to rebuilding these. So using a cache like an actual cache over multiple git statuses is up to the caller, and its pretty complex so the caller would need to know a lot to be able to use this, also for dubious benefit.
  2. I did leave some room open for linux based speedups later, but I believe that a different implementation would be needed as lstat on linux is fast and a cache wouldnt really speed things up, the only option here is to instead include a directory keyed cache which would be able to check for untracked files, and meaning that you can do fewer lstats overall, but that would be a perhaps 10-20% speedup, not a 300% (with 1000% possible) speedup like on windows.
  3. for reference check out this custom implementation of git status I have here: https://github.com/special-bread/tests-git-status - this can do a git status of linux (the above test case) in ~70ms, but is redone entirely, and also has some slightly different behaviour which is fine for my purposes but not identical to git - i.e. how it considers case sensitivity, how it treats some states as clean if git index entries cancel out, and some other details. I think that its possible to reach and beat the time in there, but that would require more invasive changes which I thought would be fairly rough for a PR that touches a piece of core functionality.
  4. see related issue here: "gix status" is slow on Windows GitoxideLabs/gitoxide#2296

Given that this is a common piece of functionality I would love for someone else to test this too, I myself have been embarrassingly busy recently so this PR cooked for a while, and I may have missed some stuff while working on it on and off.

This creates a cache of file metadata that is then prefilled by windows to allow per-directory walking instead of per file. As a result performance is much faster.

The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.

Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.
Follow up to git status performance improvement, this fixes an edge case where a case sensitive entry in the cache gets lowercased and matches a second case sensitive entry in the tree, potentially resulting in incorrect git status entries. Skipping lowercasing entirely results in those cases being a cache miss instead making it more transparent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants