Skip to content

Conversation

blester125
Copy link
Collaborator

@blester125 blester125 commented Jun 26, 2023

Currently, when a pattern in .gitattributes matches the path in git-theta track, the theta attributes (filter, merge, diff) are added to that line. This can cause issues as it may result in tracking more files with git-theta than expected.

However, just adding a new line that is an exact match for the path to the end of the gitattribute file is not correct either. The last attribute line in the file is the one used by Git. This could result in git-theta track removing an some other attribute that is set for that file.

This PR updates the way that git attributes are set. A new line that is an exact match to the current path is added, the currently active attributes are copied over, and the new theta attributes are added.

When a line has an attribute set multiple times, the last one is used so we can add the theta filters at the end to override any previous ones.

Note that with this PR running git-theta track on the same file multiple times it will keep adding lines to .gitattributes.

This change is based on a comment in #214

This also adds a is_theta_tracked function. (I meant to make it in a different branch lol)

This change is based on a part of
#214 where the test for if a
file is tracked by Git-Theta is abstracted into a function.

However it is implemented differently as it now handles the subtleties of
what attribute line is active at a given time.

Only the final line that a path matches is used to set attributes and
only the last entry for some key is used. This is now respected.****

@blester125 blester125 requested a review from nkandpa2 June 26, 2023 20:00
@craffel
Copy link
Contributor

craffel commented Jun 27, 2023

Maybe this is a silly question, but

Note that with this PR running git-theta track on the same file multiple times it will keep adding lines to .gitattributes.

Why can't we just check if the file is already tracked (by inspecting the .gitattributes file) and throw a warning/don't track if it is?

@blester125
Copy link
Collaborator Author

It depends if we want git theta track to always add a specific line for our current model or not. We didn't want to add attributes to general filters like *.pt, because we could be tracking too much but it seems like we could skip adding if an active gitattribute line sets all the right attributes.

If we want to always have a specific line for our current model there were issue detecting if a match was general or specific which is why I didn't have a check like that.

Does this make sense:

When git track path is run:
If the active config line for path sets filter=theta, merge=theta, and diff=theta then we do nothing.
Otherwise we create a new config entry specifically for path
If the active config line sets attributes on path that are disjoint from filter, merge, diff then we copy them into the new config entry.
If the active config line sets non git-theta values for filter, merge, or diff we log a warning (or crash?)
We write the new config entry for path

@craffel
Copy link
Contributor

craffel commented Jun 28, 2023

Ah, I forgot about wildcard. Just to clarify,

If the active config line for path sets filter=theta, merge=theta, and diff=theta then we do nothing.

Does this include a config line that covers path via a wildcard?

If the active config line sets non git-theta values for filter, merge, or diff we log a warning (or crash?)

Seems like a crashable offense to me.

@blester125
Copy link
Collaborator Author

If the active config line for path sets filter=theta, merge=theta, and diff=theta then we do nothing.

Does this include a config line that covers path via a wildcard?

This would include a config line that covers a path via wild card if it was the last config line that covers that path

If my path was my-model.pt

*.pt filter=theta merge=theta diff=theta

would result in no-op, but

*.pt filter=theta merge=theta diff=theta
my-model.* filter=lfs

would result in an error

@craffel
Copy link
Contributor

craffel commented Jun 29, 2023

Cool, SGTM.

@blester125
Copy link
Collaborator Author

I updated this PR to the behavior we discussed.

Currently, when a pattern in `.gitattributes` matches the path in
`git-theta track`, the theta attributes (filter, merge, diff) are added
to that line. This can cause issues as it may result in tracking more
files with git-theta than expected.

However, just adding a new line that is an exact match for the path to
the end of the gitattribute file is not correct either. The *last*
attribute line in the file is the one used by Git. This could result in
`git-theta track` removing an some other attribute that is set for that
file.

This PR updates the way that git attributes are set. If there are
already git attributes set to non-theta values for the file that are used by git-theta
(filter, merge, diff) an error is raised. If these attributes are all
set to git-theta, no new entry is added, even when the entry is a
pattern match instead of an exact match. If the new file has no
attributes set, or attributes that don't overlap with git-theta, then a
new entry is added. Non-overlapping attributes are copied down if they
were set before.

When a line has an attribute set multiple times, the *last* one is used
so we can add the theta filters at the end to override any previous
ones.

Added a `is_theta_tracked` function similar to the one from
r-three#214 where the test for if a
file is tracked by Git-Theta is abstracted into a function.

However it is implemented differently as it now handles the subtleties of
what attribute line is active at a given time.

Only the final line that a path matches is used to set attributes and
only the last entry for some key is used. This is now respected.

Git attributes support the "unsetting" of attributes (e.g. "-diff"),
these attributes are correctly copied around, but they aren't currently logically
checked to see if they intersect with git-theta attributes.
@blester125 blester125 merged commit 9775781 into r-three:main Jan 24, 2024
@blester125 blester125 deleted the fix/gitattributes branch January 24, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants