Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for better Package URLs, better metadata capturing, and normalized a lot of functionality #53

Merged
merged 12 commits into from
Feb 4, 2025

Conversation

dpp
Copy link
Contributor

@dpp dpp commented Feb 2, 2025

💻 Description of Change(s) (w/ context)

A substantial re-write of Goat Rodeo

Normalized traversal into containers. This means that filesystems, ISOs, JARs, TARs, etc. are all
treated the same way.

The result is that if there's an ISO with a TAR file that contains JAR/POM files, things will "just work"

Error handling is at the traversal level so that if there's an error processing or opening an
archive, that error is isolated.

Package URL handling is normalized via the PackageURL package

Metadata extraction is handled by specialized handlers for Maven and Debian types.

The Maven and Debian strategies are managed by passing all the artifacts to a handler and allowing the handler to
choose which files to handle. Thus, mixing Debian, Maven, etc. in one TAR will "just work"

🧠 Rationale Behind Change(s)

Goat Rodeo was a mess. This set of changes normalizes a lot of the function (choosing a per-artifact-type
strategy with implementations for Maven and Debian) as well as recursive file traversal.

The changes will allow rapid additions of new container types (beyond ISO, TAR, JAR, etc.)

The changes will allow addition of support for other artifact types (RPM, Python, etc.)

📝 Test Plan

Added a few tests and all existing tests pass.

📜 Documentation

A lot of ScalaDocs

💣 Quality Control

(All items must be checked before a PR is merged)
Did you…

  • Mention an issue number in the PR title?
  • Update the version # in the build file?
  • Create new and/or update relevant existing tests?
  • Create or update relevant documentation and/or diagrams?
  • Comment your code?
  • Fix any stray verbose logging (removing, or moving to debug / trace level)?

Before Merging…

  • Make sure the Quality Control boxes are all ticked
  • Make sure any open comments or conversations on the PR are resolved

dpp added 5 commits January 28, 2025 13:13
…software heritage id which is just a gitoid:sha1 so it was duplicate

Signed-off-by: David Pollak <[email protected]>
…rt for package URLs, pom files, and general metadata

Signed-off-by: David Pollak <[email protected]>
@dpp dpp requested review from aredridel and earldouglas February 2, 2025 01:08
Copy link
Contributor

@aredridel aredridel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems pretty sensible

myHash < thatHash
}

def fixReferences(store: Storage): Item = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fix"? What's this actually doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth separating the logic for creating an updated alias with modified connections from the side effect of writing to the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

"text/x-java-properties",
"text/x-scala",
"text/x-java-source",
"application/x-sh",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regret to inform you shar.zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry... what is the issue with shar.zip ? Missing what I need to enhance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. It's not like, super critical, but shell files can be archives (using sharutils), and there's one in that zip file. I fully expect we'd not support it but just archive formats are wild

.toDouble
val itemsPerMinute = itemsPerSecond * 60.0d
val left = totalItems.toDouble - updatedCnt.toDouble
f" Items/minute ${itemsPerMinute.round}, est remaining ${(left / itemsPerMinute).round} minutes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

…arentContext' to support better plugin functionality

Signed-off-by: David Pollak <[email protected]>
@dpp dpp requested review from aredridel and earldouglas February 3, 2025 20:24
Copy link
Contributor

@earldouglas earldouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor!

@dpp dpp merged commit 9f1f4c3 into main Feb 4, 2025
2 checks passed
@dpp dpp deleted the purls branch February 12, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants