Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: port remote management to gix #5553

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

emilazy
Copy link
Contributor

@emilazy emilazy commented Feb 2, 2025

(Plus one more change to remove the other remaining git_util::get_git_repo caller.)

I’m not wholly confident in the first commit here, but the rest should hopefully be good. It feels like there might be awkward race conditions in here but I think Git is just like that.

@Byron 👋 I thought you might be interested in some API UX feedback I came up with while working on this. If you’d like me to open issues/discussions on the gix side for any of these please let me know.

  • It’s awkward that gix::Remote::save_to handles the fussy details of removing the other sections for that remote from the config, but leaves you to reimplement it yourself if you need it in any other context. It would be nice if the removal logic was exposed separately, or perhaps if there was a specific way to tell gix that you’re renaming a remote and the old config should go away.

  • You can change the name of a gix::Remote, and you can set its push URL, but you can’t change its fetch URL. In set_remote_url I have to manually copy over all the data from the existing remote to achieve this. Not sure if this is a fundamental constraint of the model or just a missing API.

  • It’s a little surprising that there’s a rich representation of parsed refspecs but the remote API expects you to pass in a BStr and deal with potentially‐impossible parse errors.

  • The borrowing model of configuration and remotes interact awkwardly. In a previous version of this PR where I tried to persist the remote configuration changes to the gix::Repository, I had to clone the entire thing so that I could hold a gix::Remote and a gix::config::SnapshotMut at the same time (otherwise the immutable and mutable borrows conflicted). That felt awkward for a task as simple as adding a remote to the configuration. (In general when using the gix API I kind of wish there was less borrowing and more refcounting/cloning, though I realize that there’s a subjective element to the trade‐off here and that you can wrap a borrow‐happy API in a clone‐happy one but not vice versa.)

  • Relatedly, there’s this comment in the code:

    // TODO: make it possible to load snapshots with reloading via .config() and write mutated snapshots back to disk which should be the way
    //       to affect all instances of a repo, probably via `config_mut()` and `config_mut_at()`.

    That would be quite nice for us!

In general there’s more convoluted logic here in the move from git2 to gix, but thankfully most of that is just dealing with lower‐level APIs that don’t bundle configuration changes, ref renames, etc. into one porcelain call. Other than the things listed above, the process went quite smoothly; thanks again for all your work!

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

Copy link
Contributor

@bsdinis bsdinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this. Amazing effort!
I'm still working through the patch, will pick it back up tomorrow

lib/src/git.rs Outdated Show resolved Hide resolved
@@ -195,8 +194,8 @@ fn configure_remote(
remote_name: &str,
source: &str,
) -> Result<WorkspaceCommandHelper, CommandError> {
let git_repo = get_git_repo(workspace_command.repo().store())?;
git::add_remote(&git_repo, remote_name, source)?;
let git_repo = git::get_git_repo(workspace_command.repo().store())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the only point i see for getting instantiating the git repo in the cli is to use it directly.
If we are constructing it just to sink it back into a lib API, might just be best to send a store directly to the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just drop the argument entirely and get the Git repository out of the MutableRepo. I wasn’t really thinking about the API design here, just matching the previous version, but it seems like a good idea. Will probably make that change tomorrow unless anyone objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's even better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we’d have to include UnexpectedGitBackendError in the error enum to handle the case of calling it on a repository with another backend. Maybe it’s better to rule that out in the type like currently? Not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we’d have to include UnexpectedGitBackendError in the error enum

That's more consistent with other git:: API, so I like the idea of passing repo/mut_repo/store instead of git_repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions now all take &Arc<ReadonlyRepo> or &mut MutableRepo, though sometimes only to get a gix::Repository out of them; let me know if this is what you had in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be &dyn Repo for the former?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, &dyn Repo is more common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve actually made them use &Store for consistency with get_all_remote_names.

cli/src/commands/git/remote/add.rs Outdated Show resolved Hide resolved
cli/src/commands/git/remote/remove.rs Outdated Show resolved Hide resolved
cli/src/commands/git/remote/rename.rs Outdated Show resolved Hide resolved
cli/src/commands/git/remote/set_url.rs Outdated Show resolved Hide resolved
lib/src/git.rs Outdated Show resolved Hide resolved
@@ -195,8 +194,8 @@ fn configure_remote(
remote_name: &str,
source: &str,
) -> Result<WorkspaceCommandHelper, CommandError> {
let git_repo = get_git_repo(workspace_command.repo().store())?;
git::add_remote(&git_repo, remote_name, source)?;
let git_repo = git::get_git_repo(workspace_command.repo().store())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we’d have to include UnexpectedGitBackendError in the error enum

That's more consistent with other git:: API, so I like the idea of passing repo/mut_repo/store instead of git_repo.

lib/src/git.rs Outdated
Comment on lines 1350 to 1349
let mut temp_config_file = NamedTempFile::with_prefix_in("config", self.repo_path)?;
self.config
.write_to_filter(&mut temp_config_file, |section| {
section.meta() == &self.config_meta
})?;
temp_config_file
.persist(
self.config_meta
.path
.expect("Git repository to have a config file"),
)
.map_err(|err| err.error)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should use atomic rename. If the existing file was a symlink, the link would be replaced with a normal file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve moved this to regular File::create. Not sure if that’s what we want, or if we should worry about fsync, or what.

lib/src/git.rs Outdated
}
}

impl Deref for GitConfigEditor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suppose GitConfigEditor is mainly designed for local use, the API ergonomics provided by Deref wouldn't be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that it helped a bit with the functions that were already quite noisy from the error handling. My expectation was that we’d end up having cause to modify the Git configuration more in future and so a nice API would be helpful, but I don’t know how likely that is. I can drop it if it’s unwanted, although now the type is just a wrapper around gix::config::File (so it could maybe even be an extension trait instead or something).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a couple of free functions? I don't feel strongly, but I tend to avoid using Deref unless the type is like smart pointers. Hopefully there will be a higher-level abstraction in gix in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I’ve replaced it with a couple free functions, since it kinda ended up withering to nothing. It used to be more elaborate.

lib/src/git.rs Outdated
.prefixed(prefix)?
.map(|result| result.map(&mut editor))
.flatten_ok()
.collect::<Result<Vec<_>, _>>()?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be better to add a function that returns Vec<RefEdit>? It's probably simpler, and caller can concatenate more edits if needed.

I think it's also good to inline this helper function to caller to make it clear that .strip_prefix() with the same prefix will never panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what you mean by this? There are two callers of this function and only one of them uses .strip_prefix(), but they both have the same structure of “iterate through prefixed references → perform one or more edits (specifically, at least delete_ref plus a potential update for a rename) → commit reference edits”. Factoring it out made it a lot nicer (especially with the “iterator of Results” thing gix forces on us here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to extract helper function, I would add:

fn build_prefixed_refs_edits<Edits>(
    git_repo: &gix::Repository,
    prefix: impl AsRef<std::path::Path>,
    mut editor: impl FnMut(gix::Reference) -> Edits,
) -> Result<Vec<gix::refs::transaction::RefEdit>, Box<dyn std::error::Error + Send + Sync>>
where
    Edits: IntoIterator<Item = gix::refs::transaction::RefEdit>,
{
    git_repo
        .references()?
        .prefixed(prefix)?
        .map_ok(&mut editor)
        .flatten_ok()
        .try_collect()
}

The reason is that callers can potentially add more RefEdits to the list and issue them all at once.

However, there are only two callers and of them doesn't need .flatten_ok(), so I would just duplicate this to callers. In order to map error types to Box<dyn ..>, I might extract the inlined code blocks to separate functions or closures (e.g. rename_remote_git_refs() and remove_remote_git_refs().)

I might add copy_ref(..) -> RefEdit function (like delete_ref()) because the construction of RefEdit is noisy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve inlined this into the callers. It feels a little more bureaucratic to me on the whole (having to name the dyn type multiple times isn’t so fun, mostly), but not as bad as I was worried. I abstracted away the RefEdit construction since as you said it’s painfully noisy.

lib/src/git.rs Outdated
.remote_at(new_remote_url)
.map_err(GitRemoteManagementError::from_git)?
.with_fetch_tags(remote.fetch_tags());
for direction in [gix::remote::Direction::Fetch, gix::remote::Direction::Push] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, the old git2 implementation wouldn't update the "push" URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t URLs, it’s refspecs. This is just copying the data from the old remote wholesale (after creating a new one with the right URL), because for some reason the gix API doesn’t allow you to set the URL of an existing Remote, only create a new one. I assume libgit2 doesn’t have that limitation and updates the fetch URL in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't notice that. It might be easier to use the config API, but I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve abstracted this out into a remote_fetch_url function to shim the missing gix API and added an explanatory comment for the data copying. Since @Byron says that this is just a missing API on the gix side, I think doing it this way is nicest, as once it grows the missing API the set_remote_url code will remain very simple. gix implements a fair amount of logic to handle existing config sections, so I think it’s best to use the higher‐level API when possible, although I don’t know if it’s specifically relevant in this place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since new functions aren't trivial wrappers, I think it's better to insert some snapshots of .git/config to CLI tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to add those; do you mean in e.g. cli/tests/test_git_remotes.rs? (Is this related to the changes in cli/src/commands/git/clone.rs specifically or just a more general remark?)

FWIW I think it is the refs stuff I am less confident about having gotten completely correct relative to the libgit2 behaviour; the configuration manipulation was relatively simple by comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant test_git_remotes.rs and to test the refspec updates.

FWIW I think it is the refs stuff I am less confident about having gotten completely correct relative to the libgit2 behaviour; the configuration manipulation was relatively simple by comparison.

Yeah, it might also be good to add jj-lib tests for that, but I'm not sure what should be tested. I'll need to review libgit2 and/or git implementation.

Another totally different approach is to shell out git remote command instead of porting to gix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added Git config snapshot tests to cli/tests/test_git_remotes.rs, and some testing of the ref removals/renames to lib/tests/test_git.rs. Let me know what you think!

I think it would be unfortunate to resort to shelling out here. gix doesn’t have push functionality, and fetching and pushing have a lot of complex customization points and interactions with the rest of the system. Remotes are a comparatively simple config and ref management job, so I hope we can avoid adding further uses of subprocessing in favour of using and extending gix.

@bsdinis Sorry for more tests that will need converting; I followed the convention of the existing tests and helpers in the file :) These should be easy to port over, but if your PR lands first I’m happy to rebase this and do the port of the new ones myself.

@emilazy emilazy force-pushed the push-ollqypvnszvt branch 2 times, most recently from ed8b4dd to d0bd11c Compare February 3, 2025 01:38
@Byron
Copy link

Byron commented Feb 9, 2025

Thanks for reeling me in, and sorry for the late response!

  • It’s awkward that gix::Remote::save_to handles the fussy details of removing the other sections for that remote from the config, but leaves you to reimplement it yourself if you need it in any other context. It would be nice if the removal logic was exposed separately, or perhaps if there was a specific way to tell gix that you’re renaming a remote and the old config should go away

Indeed, renaming isn't supported yet, and having some of the existing logic available would help implementing it.
Alternatively it would be nice if one could just rename a remote while handling the configuration automatically.

  • You can change the name of a gix::Remote, and you can set its push URL, but you can’t change its fetch URL. In set_remote_url I have to manually copy over all the data from the existing remote to achieve this. Not sure if this is a fundamental constraint of the model or just a missing API.

I don't think this is more than an incomplete API.

  • It’s a little surprising that there’s a rich representation of parsed refspecs but the remote API expects you to pass in a BStr and deal with potentially‐impossible parse errors.

I think this is me avoiding hassle. What it really wants to do is to allow passing various inputs, including strings, and parse it when needed. Right now there probably is only one API that does this (find_reference()), and that alone gave me quite some trouble already. It's probably still not working intuitively in all common cases.
With that said, other crates are able to pull it off, so I am sure there is a way.
In any case, that's the reason for resorting to AsRef<BStr> (and seeing this tells me that this also is a workaround as typically one would do Into<BStr> if one wants it to work for &str. Probably some tests found AsRef more convenient for their input).

  • The borrowing model of configuration and remotes interact awkwardly. In a previous version of this PR where I tried to persist the remote configuration changes to the gix::Repository, I had to clone the entire thing so that I could hold a gix::Remote and a gix::config::SnapshotMut at the same time (otherwise the immutable and mutable borrows conflicted). That felt awkward for a task as simple as adding a remote to the configuration. (In general when using the gix API I kind of wish there was less borrowing and more refcounting/cloning, though I realize that there’s a subjective element to the trade‐off here and that you can wrap a borrow‐happy API in a clone‐happy one but not vice versa.)

The clone-happy APIs in jj are really nice and makes the code so much more 'to the point', and simple-looking. I am loving it.
With that said, gix can still change and maybe it should, to fully embed an Rc/Arc gix::Repository instead of a reference, to fix the issue we see today with &mut and & interactions.
However, I'd hope there can be other ways of dealing with this, and usually I fix these issues once I encounter them myself. After all, the main 'components' of a repository are shared and ref-counted already, so certain types could embed them directly if that avoids some &mut hassle.

In general there’s more convoluted logic here in the move from git2 to gix, but thankfully most of that is just dealing with lower‐level APIs that don’t bundle configuration changes, ref renames, etc. into one porcelain call.

And I am really sorry for this being so half-done right now.
Strangely, these days I find myself working on application code of GitButler and have to resort myself to plumbing code to write Git configuration, for example, or to downright having to use git2 if gitoxide doesn't have the feature yet. One would think that I'd have the time to 'just' add the missing feature, but not really.

Probably for you it's very much the same, but if not, or if there is some room to breathe once the first version of a feature is delivered in jj, maybe there is a chance you could add the API you'd want to solve the problem nicely with gix.
Client code is a great driver for this, it just takes an extra step to get there.

I will try to make these steps for GitButler eventually (sprinkling even more TODO's there 😁), and encourage you to do the same.

@emilazy
Copy link
Contributor Author

emilazy commented Feb 10, 2025

Thanks for the detailed reply, @Byron!

Indeed, renaming isn't supported yet, and having some of the existing logic available would help implementing it. Alternatively it would be nice if one could just rename a remote while handling the configuration automatically.

Higher‐level porcelain APIs would definitely be welcome! But I understand that Git has many layers and there’s a lot of subtlety with higher‐level operations, so just having the lower‐level ones factored in a more reusable way would be great here.

I don't think this is more than an incomplete API.

That’s good to know; I’ll leave a comment to that effect in the code so that it can hopefully be cleaned up later.

I think this is me avoiding hassle. What it really wants to do is to allow passing various inputs, including strings, and parse it when needed. Right now there probably is only one API that does this (find_reference()), and that alone gave me quite some trouble already. It's probably still not working intuitively in all common cases. With that said, other crates are able to pull it off, so I am sure there is a way. In any case, that's the reason for resorting to AsRef<BStr> (and seeing this tells me that this also is a workaround as typically one would do Into<BStr> if one wants it to work for &str. Probably some tests found AsRef more convenient for their input).

Right. I guess you’d want a trait for things that can be converted to refspecs, or maybe you can use TryInto and we could rely on Infallible for the identity case or such…

That said, I think it’d be fine if the remote API just only took pre‐parsed refspecs and you had to explicitly .try_into() if you wanted to parse a string representation of them.

The clone-happy APIs in jj are really nice and makes the code so much more 'to the point', and simple-looking. I am loving it. With that said, gix can still change and maybe it should, to fully embed an Rc/Arc gix::Repository instead of a reference, to fix the issue we see today with &mut and & interactions. However, I'd hope there can be other ways of dealing with this, and usually I fix these issues once I encounter them myself. After all, the main 'components' of a repository are shared and ref-counted already, so certain types could embed them directly if that avoids some &mut hassle.

Right. In this case we get around it by not actually mutating the in‐memory repository configuration (since it isn’t reflected in the ThreadSafeRepository anyway), but otherwise it’s awkward to have a remote wanting a & reference to the repo but the mutable config snapshot needing a &mut. If the remotes could avoid depending on the repository’s lifetime, or if config editing used interior mutability (as I guess it might have to in order to write to a ThreadSafeRepository in the future anyway), then that conflict wouldn’t exist.

In general I think that a bit of refcounting/cloning is usually cheap and can make an API simpler and more general enough that it’s worth it; I would say that lifetimes and error types are probably the trickiest parts of using gitoxide in my experience. But I can also understand wanting to exploit Rust’s capabilities to the fullest and avoid any unnecessary copying.

And I am really sorry for this being so half-done right now. Strangely, these days I find myself working on application code of GitButler and have to resort myself to plumbing code to write Git configuration, for example, or to downright having to use git2 if gitoxide doesn't have the feature yet. One would think that I'd have the time to 'just' add the missing feature, but not really.

Probably for you it's very much the same, but if not, or if there is some room to breathe once the first version of a feature is delivered in jj, maybe there is a chance you could add the API you'd want to solve the problem nicely with gix. Client code is a great driver for this, it just takes an extra step to get there.

I will try to make these steps for GitButler eventually (sprinkling even more TODO's there 😁), and encourage you to do the same.

Yes, the temptation to just dive into the gitoxide code base is strong, but the more I let myself get carried away with such things the less likely I am to actually finish what I promised to do in the first place :) I’m also not too familiar with gitoxide’s codebase and haven’t fully wrapped my head around the code organization yet.

Some of these things like the missing API to set the fetch URL seem simple, though, so I’ll see if I find the time to send a PR for that. If not, maybe someone else will see the TODO comment and take up the mantle in future!

Thanks again for all your work on gitoxide and for your thoughts on my feedback; a Git reimplementation is a mammoth undertaking and I don’t think you have anything to be sorry for it not yet being perfectly polished! Now that we have code to shell out to git(1) for fetches and pushes, being able to drop our git2 dependency and rely on gitoxide (+ shelling out) alone is in sight. Hopefully that will help increase the synergy between Jujutsu and gitoxide development further :)

Copy link
Contributor

@yuja yuja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

[remote "bar"]
url = http://example.com/repo/foo
fetch = +refs/heads/*:refs/remotes/bar/*
"#);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add rename tests with [branch ".."] section and custom fetch refspecs?

jj doesn't use them, but colocated git repo may have configuration for git CLI. libgit2 updates them.

https://github.com/libgit2/libgit2/blob/b042e574064859c01cb902a3f4f418a3706eb78d/src/libgit2/remote.c#L2558

Comment on lines +1535 to +1538
.replace_refspecs(
[format!("+refs/heads/*:refs/remotes/{new_remote_name}/*").as_bytes()],
gix::remote::Direction::Fetch,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the original refspec if it doesn't match the default refspec. There may be negative fetch spec for example.

Comment on lines +1580 to +1584
old_ref
.name()
.as_bstr()
.strip_prefix(old_prefix.as_bytes())
.expect("old ref name to have old prefix"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's okay to just remove the prefix by [old_prefix.len()..]?

/// Shim for the missing `gix::Remote::fetch_url` API.
///
/// **TODO:** Upstream an implementation of this to `gix`.
fn remote_fetch_url<Url, E>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to remote_with_fetch_url? remote_fetch_url sounds like returning a URL.

let git_repo = get_git_repo(repo);
empty_git_commit(&git_repo, "refs/remotes/foo/a", &[]);
empty_git_commit(&git_repo, "refs/remotes/foo/b", &[]);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add more refs to test edge case behavior. e.g. refs/remotes/foobar/a, refs/remotes/foo/x/y, etc.

(also applies to rename test)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants