Add GitIdentifiers helper#428
Conversation
This change moves `gitBlob` and `gitTree` from `DigestUtils` into a separate utility class, to prepare for an enhancement of the provided API. The git tree identifier can be computed for many objects: the most natural is a directory in a filesystem, but we can also compute the identifier on an archive containing this directory. Additional usages will require expanding the API, beyond what can be reasonably contained in `DigestUtils`.
The `OpenOption` parameters are not very useful, since files are usually opened read-only.
This change adds a `GitIdentifiers.TreeIdBuilder` class to allow the computation of a SWHID identifier from an archive.
garydgregory
left a comment
There was a problem hiding this comment.
Hi @ppkarwasz
Thank you for the follow up PR 😄
I had an initial pass this PR. I have high-level and low-level comments. The low-level comments are in the PR. At the high-level, I wonder if there is some YAGNI here with new APIs for all of byte[], Path, and InputStream. At least there isn't File 😉
I am not sure this needs to be that general an API that needs to deal with all of byte[], Path, and InputStream`. I would either:
- Base general input processing using IO's
builderpackage, or - Pair down the API to only what the Maven plugin needs for build attestations.
WDYT?
| * A supplier of a blob identifier that may throw {@link IOException}. | ||
| */ | ||
| @FunctionalInterface | ||
| private interface BlobIdSupplier { |
There was a problem hiding this comment.
This could just be a Commons IO IOSupplier.
There was a problem hiding this comment.
Yes, I know, but do we want to force a dependency on commons-io for something users might never use?
I would rather not add a dependency on IO at this point, but I would like to handle at least two use cases:
We can probably remove the methods with |
OK, then let's reduce the public footprint to only what's needed for the Maven plugin. |
|
I have reduced the public footprint slightly, by removing the methods with an What is left is potentially useful:
|
|
I think the two main points we should discuss, before releasing this API are:
|
If normalizing
I think we can go YAGNI here. |
This class doesn't write anything to disk nor it follows symlinks. The The reason I don't want I allowed the presence of |
|
Using the current version of this API I successfully verified that the Maven This should be enough for the Maven plugin. |
|
OK, we are good to then. Merged. |
This PR adds a
GitIdentifiersutility class to support computing SWHID identifiers for a wider range of sources.The
DigestUtils.gitTreemethod introduced in #427 was limited to directories on the local filesystem.GitIdentifiersreplaces and extends it to also handle virtual directory structures such as archive contents.
New API
blobId: computes a Git/SWHID blob identifier. Four overloads are provided:blobId(MessageDigest, byte[])blobId(MessageDigest, InputStream)blobId(MessageDigest, long, InputStream)blobId(MessageDigest, Path)treeId(MessageDigest, Path): computes a Git/SWHID tree identifier for a directory on the filesystem.treeIdBuilder(MessageDigest): returns aTreeIdBuilderfor constructing a tree identifier from any source. Thebuilder accumulates entries via:
addFile(FileMode, String, …): three overloads matching those ofblobIdabove; paths containing/automaticallycreate intermediate subdirectories.
addDirectory(String): creates a subdirectory node and returns its builder; accepts multi-level paths.build(): computes and returns the tree identifier.FileMode: enum with valuesREGULAR,EXECUTABLE,SYMBOLIC_LINK, andDIRECTORY.Example