-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git-lfs support #10153
base: master
Are you sure you want to change the base?
git-lfs support #10153
Conversation
Small complication: In other words, I don't currently see a way to:
|
Using git-lfs, when the flake copies the repo to the store (for purity) the 'virtual file' stored in git is copied (with oid/size info of the object in LFS) instead of the actual (large) file :/ ref: NixOS/nix#10153 I think it was working before because the file was in git temporarily at some point, then I moved it to LFS, but after the system was built.. (or something like that 🤷)
Using git-lfs, when the flake copies the repo to the store (for purity) the 'virtual file' stored in git is copied (with oid/size info of the object in LFS) instead of the actual (large) file :/ ref: NixOS/nix#10153 I think it was working before because the file was in git temporarily at some point, then I moved it to LFS, but after the system was built.. (or something like that 🤷)
Using git-lfs, when the flake copies the repo to the store (for purity) the 'virtual file' stored in git is copied (with oid/size info of the object in LFS) instead of the actual (large) file :/ ref: NixOS/nix#10153 I think it was working before because the file was in git temporarily at some point, then I moved it to LFS, but after the system was built.. (or something like that 🤷)
What use case do you have in mind? Isn't LFS typically for large files, that wouldn't usually affect evaluation anyway? |
|
A FOD seems optimal here, in general you shouldn't use builtins.fetchGit if you're only going to use it at build time. |
In general I agree, but (afaik) other fetchers can't use git credentials. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-03-11-nix-team-meeting-132/42960/1 |
@roberth have you had a chance to take a look at this issue? We have been staying at older versions of Nix as a workaround but newer versions now have fixes for critical issues so sticking to old ones would no longer be optimal. |
Hi @b-camacho, thanks for the ping and sorry for the delay. This PR was assigned to me, but I hadn't prioritized it because it was a draft. Wrong assumption on my end, because I do think this is valuable, and I have some things to say :)
That's a good start, but we need to make sure that the smudging happens in a controlled manner; otherwise we risk adding impurities. Specifically, we should parse the attribute to check that they're supposed to be unsmudged by lfs; if not, ignore the smudge rule. It seems you were already investigating how this could be implemented. Furthermore, we should validate the sha256 so that we don't increase the potential for silent errors by a whole external program. The hash should be easy to parse from the pointer file, and while reading other programs' inputs is a little ad hoc, I don't expect any serious issues from this, as we won't cause users to accidentally rely on a bug this way.
This won't happen unnecessarily either of these are implemented If we need to backtrack on the removal of narHashes (#6530), we can also avoid re-locking transitive inputs whose lock has already been computed by the dependency's lock. So yes, this isn't efficient yet, but it will be.
A fixed output derivation works best when all you're using it for is as an input to another derivation (and it's publicly available, as mentioned). To summarize, this is worth implementing, I see no blocking issues, design or otherwise, and the following needs to be done:
|
What's the state on this PR? Seems to unfortunately be a bit stale given the delayed review. This issue has been plaguing us for a while, so I'm willing to pick up the torch here and try to get this out the door (was actually starting to see how to fix this myself back in March when I saw this PR and decided to see what came out of this). |
@kip93 I think your question was directed towards @b-camacho, but I'd like to add that we would welcome and support anyone who'd like to work on this. Feel free to ask questions here or in the meetings if you can make them. We generally have some agenda, but we also like to make time for contributors during or after, when we often hang out while we get some things done. Link to the video conference is in the scratchpad linked there. We also have a matrix room, although personally I'm guilty of neglecting that one sometimes. |
Thanks for the thorough writeup @roberth ! Once I add some tests and integrate git-lfs-fetch-cpp here, we should be ready for another review! I'm still on vacation with not-great internet, but back in 6 days and will update you all on 7/31 regardless. Thanks for the feedback and sorry for the wait! |
Oh, I don't think shelling out was such a big deal because we can verify the correctness of the result, kind of like how fixed output derivations are allowed to do "grossly impure" things because we can verify the output. I guess a library implementation of it is still nice for a consistent UX with a small closure size though. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Hey! It's me again! I just want to ask if there's anything I can help with here. Maybe I can try and doing some testing, or do a smaller version of this that uses the git-lfs CLI tools while the full implementation gets done? We have a lot of repos with LFS files that would greatly benefit from this, so I'm willing to do whatever work is needed, but also don't want to add extra work for others where it's not wanted. |
Ok I've reintroduced the git_attr_get_ext and have it working. I'm currently testing that it does not import files into the store too early like it used to be the case the first time around. Also rewrote the tests to use subtests as suggested by the code review. |
With my quick-ish tests (with several large lfs repos which take a long time to clone) it all looks ok? I think it's ready for another review. Given that holidays are around the corner, I won't have any more time to work on this 'till after new years (of course, I'm also not expecting you to skip xmas to review this, I'm fine if the review is delayed as well). If anything needs fixing I can take this back up again probably 6-ish of january. |
Hey! Been back at it for a couple of days now, trying to implement the code reviews. Currently, only the reworks of Fetch::fetchUrls is left, but I might or might not be down a rabbit hole since the FileTransfer::download (nor the FileTransfer::upload) quite do what I need it to. Will let you guys know when it's all done. |
Plus, switched CURLOPT_PROGRESSFUNCTION to CURLOPT_XFERINFOFUNCTION since docs say it's deprecated
Been testing it, looks like it's working for me. I'd say this is ready for another review. |
src/libfetchers/git-lfs-fetch.hh
Outdated
// example resp here: | ||
// {"objects":[{"oid":"f5e02aa71e67f41d79023a128ca35bad86cf7b6656967bfe0884b3a3c4325eaf","size":10000000,"actions":{"download":{"href":"https://gitlab.com/b-camacho/test-lfs.git/gitlab-lfs/objects/f5e02aa71e67f41d79023a128ca35bad86cf7b6656967bfe0884b3a3c4325eaf","header":{"Authorization":"Basic | ||
// Yi1jYW1hY2hvOmV5SjBlWEFpT2lKS1YxUWlMQ0poYkdjaU9pSklVekkxTmlKOS5leUprWVhSaElqcDdJbUZqZEc5eUlqb2lZaTFqWVcxaFkyaHZJbjBzSW1wMGFTSTZJbUptTURZNFpXVTFMVEprWmpVdE5HWm1ZUzFpWWpRMExUSXpNVEV3WVRReU1qWmtaaUlzSW1saGRDSTZNVGN4TkRZeE16ZzBOU3dpYm1KbUlqb3hOekUwTmpFek9EUXdMQ0psZUhBaU9qRTNNVFEyTWpFd05EVjkuZk9yMDNkYjBWSTFXQzFZaTBKRmJUNnJTTHJPZlBwVW9lYllkT0NQZlJ4QQ=="}}},"authenticated":true}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example may be suitable for a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test this function would require at least an http server and for full test also an ssh server, tried to find if there's some other test I can reference from but none that I could find.
src/libfetchers/git-lfs-fetch.hh
Outdated
const auto md = parseLfsMetadata(std::string(content), std::string(pointerFilePath)); | ||
if (md == std::nullopt) { | ||
debug("Skip git-lfs, invalid pointer file"); | ||
warn("Encountered a file that should have been a pointer, but wasn't: %s", pointerFilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer for this and the above error to be an error, so that a potential "outdated" version of Nix won't succeed with the wrong result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one thing I did not do (yet?) since the current implementation replicates the behaviour of the git-lfs package. If we make this an error there will be a disparity that might confuse people.
Now, the ideal thing would be for people to get their shit together and not have half-baked repos with weird lfs but not really lfs files. But I've seen this in the wild (actually run into this when looking for example repos to test this implementation, so it's not as rare as one might like to think). And I think it's common with nix to package other people's code, so we might not have the possibility of fixing the root cause.
So we either also throw a warning (risky for compatibility with multiple nix versions) or we diverge and fail in these cases (risky due to incompatibilities with some existing repos), but then we need to document that malformed repos will not work with our implementation.
I'm fine with either, and I AM somewhat inclined to make this a hard fail, but I don't think it's a choice that should be made lightly.
Eventually this should probably become a struct of options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some code review/cleanup, otherwise looks great! Thanks!
Nice, thanks for the support! Looks like with my subpar c++ skills I missed some details (: Now, from what I see there's 2 final talking points.
|
Motivation
nix
fetches git repos usinglibgit2
, which does not run filters by default. This means LFS-enabled repos can be fetched, but LFS pointer files are not smudged.This change adds a
lfs
attribute to fetcher URLs. Withlfs=1
, when fetching LFS-enabled repos, nix will smudge all the files.Context
See #10079.
Git Large File Storage lets you track large files directly in git, using git filters. A
clean
filter runs on your LFS-enrolled files before push, replacing large files with small "pointer files". Upon checkout, a "smudge" filter replaces pointer files with full file contents. When this works correctly, it is not visible to users, which is nice.Changes
builtins.fetchGit
has new boollfs
attrlfs=true
,GitSourceAccessor
will smudge any pointer files with the lfs filter attributetests/nixos/fetchgit
(this is why lfs is now enabled on the test gitea instance)Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.