Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to use version ranges for package-configurations #9918

Open
MarcelBochtler opened this issue Feb 11, 2025 · 18 comments
Open

Provide a way to use version ranges for package-configurations #9918

MarcelBochtler opened this issue Feb 11, 2025 · 18 comments
Labels
configuration About configuration topics

Comments

@MarcelBochtler
Copy link
Member

MarcelBochtler commented Feb 11, 2025

When scanning ORT as a package with ORT, one finding always shows up:

"LicenseRef-scancode-agpl-generic-additional-terms" to SpdxConstants.NOASSERTION,

This line is being detected as AGPL-3.0-or-later by scancode.

As this is not a license declaration, I am removing the finding with a package-configuration.

package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:51.0.0"
  source_artifact_url: "https://repo.maven.apache.org/maven2/org/ossreviewtoolkit/model/51.0.0/model-51.0.0-sources.jar"
  license_finding_curations:
  - path: "config/ScannerConfiguration.kt"
    start_lines: "64"
    line_count: 1
    detected_license: "AGPL-3.0-or-later"
    concluded_license: NONE
    reason: CODE
    comment: "..."

This works well, and I like that I can remove this single finding in exactly this version.

However, I have to do that weekly, because I want to update ORT as a dependency as soon as possible, and every update requires me to write a new package configuration.

The only workaround to remove this finding for good is to overwrite every detected license for Maven:org.ossreviewtoolkit:model:, by just setting a concluded license with a version range.
Using a concluded license has multiple unwanted side effects, which is why I don't want to use them.

What do you think about implementing a way to achieve the effect of a version range also for package-configurations?

@MarcelBochtler MarcelBochtler added the to triage Issues that need triaging label Feb 11, 2025
@fviernau
Copy link
Member

For what reason are you trying to eliminate the license? (e.g use cases)

@MarcelBochtler
Copy link
Member Author

MarcelBochtler commented Feb 11, 2025

For what reason are you trying to eliminate the license? (e.g use cases)

Our ruleset disallows any usage of packages that are licensed as AGPL-3.0-or-later.

But does this really matter?
The finding is false-positive, as the file is not licensed as AGPL-3.0-or-later, so IMO I should be able to remove the finding.

@tsteenbe
Copy link
Member

This has been an annoying issue since package configurations were introduces - remember at HERE we have tens of configurations for python's setup tools. Don't get me wrong this limitations was knowingly introduced. Would be in favor to introducing configurations that work across version but would only support it if can ensure we can keep the same level of trust in ORT. As configurations overwrite scan findings and I would like be opinionated and protect users against themselves in this case by:

  • Supporting version ranges use in package configurations id only if user specifies narrow matchers (path + start_lines + line_count) so there is higher chance it only matches what it suppose to match.
  • Prior to implementation of a solution to this issue the presence of package configurations is made clear in any reports

@MarcelBochtler
Copy link
Member Author

MarcelBochtler commented Feb 11, 2025

As configurations overwrite scan findings and I would like be opinionated and protect users against themselves in this case by:

* Supporting version ranges use in package configurations id only if user specifies narrow matchers (path + start_lines + line_count) so there is higher chance it only matches what it suppose to match.

* Prior to implementation of a solution to this issue the presence of package configurations is made clear in any reports

I'm not sure if this should be required.

For curations, this is not required, and at the moment I can just add the following to my .ort.yml (or a global curation), and be done with it (when using the LicenseView.CONCLUDED_OR_DECLARED_AND_DETECTED):

curations:
  packages:
    - id: "Maven:org.ossreviewtoolkit:model:"
      curations:
        concluded_license: Apache-2.0
        comment: "..." 

But as mentioned above this is bad, and quite dangerous to do.
Which is the whole reason why I want to have the possibility to have some kind of version ranges for package configurations.

@sschuberth
Copy link
Member

What do you think about implementing a way to achieve the effect of a version range also for package-configurations?

I wanted to have this for a long time as well, also for consistency with package curations, and I believe we've talked about the wish multiple times already. I do not see a reason to not implement it finally.

@sschuberth sschuberth added configuration About configuration topics and removed to triage Issues that need triaging labels Feb 12, 2025
@fviernau
Copy link
Member

fviernau commented Feb 13, 2025

I still believe, we should not implement this and instead require the user to create a separate dedicated package configuration for each package version. And instead spent the effort in creating a helper which auto-generates a package configuration based on an already existing one. This keeps file organization simpler, package configuration simpler, slef contained, easier to review and veryify. You can automatically remove non-matching entries and so forth. Export a list of bugs for false detection fixes.

@MarcelBochtler
Copy link
Member Author

MarcelBochtler commented Feb 14, 2025

I still believe, we should not implement this and instead require the user to create a separate dedicated package configuration for each package version.

My main issue with this approach is, that this is quite a bad user experience when using ORT via CI/CI (either via ORT server, or via a GH action directly).
We are using renovate bot to update versions, but we cannot merge the PR without touching the package configurations, making it a pain to use ORT.

Remember: There is no issue here, we are trying to fix a false-positive finding.

And instead spent the effort in creating a helper which auto-generates a package configuration based on an already existing one.

The issue is not that it is hard to create a package-configuration, the issue is, that a new package-configuration it is required at all in this case.

This keeps file organization simpler, package configuration simpler, slef contained, easier to review and veryify. You can automatically remove non-matching entries and so forth. Export a list of bugs for false detection fixes.

I don't agree with every of these points, but I think that these don't justify a worse user experience.

Edit: Additionally, we already have this functionality for curations, which is already used by users as a bad fallback for this missing feature for package-configurations.

@tsteenbe
Copy link
Member

@MarcelBochtler Don't get me wrong I want to support version ranges in package configurations but it has to be implemented in the right way else it may erode user trust in ORT.

The use of a concluded license curation in a ORT scan is currently easy to spot but the use of package configuration with license finding curations is not. It's not about whether something is already possible but whether ORT is transparent about the fact license findings have been altered. When license finding curations in package configurations were implemented we deliberately made the choice to cap it what was possible as we didn't have a good way to communicate their existence throughtout an entire ORT toolchain like we have for curations.

My point is about trust in ORT results - we should make clear to anyone reading an ORT report that scanner findings / facts have been altered so in case things go wrong its easy for stakeholders to trace back when and where it happened so they can hopefully ensure it doesn't happen again. Being able to trace down things is not only useful for us ORT devs but it's a essential if you are being sued.

@sschuberth
Copy link
Member

When license finding curations in package configurations were implemented we deliberately made the choice to cap it what was possible as we didn't have a good way to communicate their existence throughtout an entire ORT toolchain like we have for curations.

Isn't that remark obsolete with the ResolvedConfiguration (also see #7453), @mnonnenmacher?

@mnonnenmacher
Copy link
Member

When license finding curations in package configurations were implemented we deliberately made the choice to cap it what was possible as we didn't have a good way to communicate their existence throughtout an entire ORT toolchain like we have for curations.

Isn't that remark obsolete with the ResolvedConfiguration (also see #7453), @mnonnenmacher?

Yes, ORT result files contain all applied package configurations, and the Kotlin API also provides the necessary data.

My point is about trust in ORT results - we should make clear to anyone reading an ORT report that scanner findings / facts have been altered so in case things go wrong its easy for stakeholders to trace back when and where it happened so they can hopefully ensure it doesn't happen again. Being able to trace down things is not only useful for us ORT devs but it's a essential if you are being sued.

License finding curations are not part of the evaluated model, this would have to be changed to be able to show them in the web app report.

Prior to implementation of a solution to this issue the presence of package configurations is made clear in any reports

I would not make this a precondition.


One more requirement to make version ranges usable is that the artifact URLs in the matcher must become regular expressions as they usually contain the version number in the filename.

@tsteenbe
Copy link
Member

@mnonnenmacher @MarcelBochtler Adding this feature will cause additional work ORT admins in the traceability depart and we recently discussed that really none of use like adding feature flags so this classic "give and take" in my book - "take" Bosch wants a losing of matcher restrictions of package configurations, "give" Bosch implements package configurations in the EvaluatedModel to balance out traceability (I hereby commit to the JavaScript work needed to them visible in WebApp report). Win-win for all I would say, deal?

@mnonnenmacher
Copy link
Member

I would really prefer to treat this as two separate issues, because I don't agree that version ranges make the tracing of package configurations much more difficult. Matching the identifiers is the easy part, with or without version ranges. The hard part is matching the file patterns of path excludes and the paths and line numbers of license finding curations against the actual files in the scanned package.

I'm not saying that we would not be willing to help with the required changes to the EvaluatedModel, I just don't like coupling issues in this way.

As this is already on the agenda for the next developer meeting, let's discuss the details there.

@fviernau
Copy link
Member

fviernau commented Feb 28, 2025

I don't agree that version ranges make the tracing of package configurations much more difficult.

Previously, you could verify correctness for any given package configuration by checking out the source code
as specified in the header, and checking correctness against that single revision.

With that proposal the package configuration applies to a range of revisions.
To verify correctness and or to make changes, you have to verify against all revisions.

@mnonnenmacher Are you saying that this it not "much more difficult"?

@mnonnenmacher
Copy link
Member

Previously, you could verify correctness for any given package configuration by checking out the source code
as specified in the header, and checking correctness against that single revision.

Not always, because the revision of the VCS matcher is already optional.

With that proposal the package configuration applies to a range of revisions.
To verify correctness and or to make changes, you have to verify against all revisions.

@mnonnenmacher Are you saying that this it not "much more difficult"?

Thomas' comment was about tracing which package configurations were applied within a specific ORT result, that's what I responded to. Also, holding up this strictness for package configurations does not help when in reality users are falling back to concluded license curations because package configurations require too much maintenance.

@fviernau
Copy link
Member

fviernau commented Feb 28, 2025

Not always, because the revision of the VCS matcher is already optional.

But most of the time, because normally a package id is matched to a single revision.

too much maintenan

That's what I'm challenging. I believe relaxing things here will cause more maintenance in total, and decrease share-ability and correctness overall.

@mnonnenmacher
Copy link
Member

I think this really depends on the perspective, so maintenance for whom? Developers usually do not want to do compliance checks but have to do them, so they prefer simple solutions. But as I said:

As this is already on the agenda for the next developer meeting, let's discuss the details there.

@fviernau
Copy link
Member

fviernau commented Mar 4, 2025

Proposal

Currently the matching of package configuration consists of:

  1. id: package id without version range
  2. Exactly one of (for the provenance):
    • source_artifact_url
    • vcs (optionally without revision)

When we loosened the vcs by making revision optional, we already lost the capability to figure out
based on which provenance the "package configuration" has been created. If we loose that anyway,
then I believe there is not much value in specifying the vcs partially, without revision, but it would be
better to omit it. This makes package configurations also simpler to create by hand, which has been mentioned (in related discussions) to be a problem for some users.

To address this, together with implementing version ranges I propose to change the matcher to consist of:

  1. id: package id, can use version range in case neither source_artifact_url nor vcs is specified.
  2. At most one of (for the provenance):
    • source_artifact_url
    • vcs, with all fields mandatory again! (including the revision.)
    • source_code_origin (this can be either VCS or SOURCE_ARTIFACT, e.g. enum value)

The corresponding scenarios would look e.g.:

# Fixed source artifact provenance
package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:51.0.0"
  source_artifact_url: "https://repo.maven.apache.org/maven2/org/ossreviewtoolkit/model/51.0.0/model-51.0.0-sources.jar"

# Fixed VCS provenance
package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:51.0.0"
  vcs:
    type: "git"
    url: "https://github.com/oss-review-toolkit/ort.git"
    revision: "7243c75e4dac66eb7636959f878f20a25d652204"

# Arbitrary VCS provenance
package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:[51.0.0,60.0.0]"
  source_code_origin: VCS

# Arbitrary source artifact provenance
package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:[51.0.0,60.0.0]"
  source_code_origin: SOURCE_ARTIFACT

# Arbitrary provenance, no matter if VCS or source artifact.
package_configurations:
- id: "Maven:org.ossreviewtoolkit:model:[51.0.0,60.0.0]"

This approach:

  • Removes the need to make source_artifact_url use regexes (as proposed in different idea)
  • if provenance matcher is specified, we do know the exact provenance the entries have been created for.
  • Allows the invariant that: if id uses version range, and either vcs or source_artifact is specified, then fail early,

Note: If this was done, I believe we should only allow in the open source ort-config repo entries with either vcs or source_artifact set.

@MarcelBochtler
Copy link
Member Author

@fviernau, I quite like your proposal, it would really make creating package-configurations by hand less cumbersome.

Note: If this was done, I believe we should only allow in the open source ort-config repo entries with either vcs or source_artifact set.

I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configuration About configuration topics
Projects
None yet
Development

No branches or pull requests

5 participants