Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Cargo's virtual workspaces #8874

Closed
gmathiou4 opened this issue Jul 12, 2024 · 13 comments · Fixed by #9716
Closed

Add support for Cargo's virtual workspaces #8874

gmathiou4 opened this issue Jul 12, 2024 · 13 comments · Fixed by #9716
Labels
analyzer About the analyzer tool help wanted An issue where third-party help is wanted on

Comments

@gmathiou4
Copy link

Found 4 Cargo definition file(s) at:
       Cargo.toml
       common/Cargo.toml
       summarizer/Cargo.toml
       new-summarizer/Cargo.toml
Found in total 4 definition file(s) from the following 1 package manager(s):
        Cargo
[DefaultDispatcher-worker-1] ERROR org.ossreviewtoolkit.analyzer.PackageManager - Cargo failed to resolve dependencies for path 'Cargo.toml': IllegalArgumentException: Virtual workspaces are not supported.
@gmathiou4 gmathiou4 added new feature to triage Issues that need triaging labels Jul 12, 2024
@sschuberth sschuberth added help wanted An issue where third-party help is wanted on analyzer About the analyzer tool and removed to triage Issues that need triaging labels Jul 13, 2024
@sschuberth sschuberth changed the title Add support for virtual workspaces Add support for Cargo's virtual workspaces Jul 13, 2024
@rtzoeller
Copy link
Contributor

Is the scope of work needed to implement this known? I.e. is this expected to be a large change, or relatively easy for a new contributor?

@sschuberth
Copy link
Member

At least to me, the scope is not really clear. I guess it mostly depends on understanding what virtual workspaces are, and how they should semantically map to the ORT data model.

From what I've read, in essence virtual workspaces are a way to group projects to ease dependency management. Which would mean that in ORT-speak they are neither projects nor packages, and could simply be omitted (which would be easy).

Or, on the other hand, they could be seen as a "fake parent project" that "depends" on the projects it groups (which would be a bit more challenging to implement).

@rtzoeller
Copy link
Contributor

IMHO ORT can just view virtual workspaces as being used to ensure a consistent set of dependency versions are used across multiple crates.

An example file hierarchy might look like

/Cargo.toml                  # Lists version constraints for dependencies dep_A, dep_B, and dep_C
/Cargo.lock                  # Selects specific versions of dep_A, dep_B, and dep_C
/crate_0/Cargo.toml          # Lists a dependency on dep_A
/crate_1/Cargo.toml          # Lists a dependency on dep_A and dep_B
/crate_2/Cargo.toml          # Lists a dependency on dep_B and dep_C

With the result being that the version of dep_A used by crate_0 matches the one used by crate_1, and the version of dep_B used by crate_1 matches the one used by crate_2.

@sschuberth
Copy link
Member

sschuberth commented Jan 8, 2025

IMHO ORT can just view virtual workspaces as being used to ensure a consistent set of dependency versions are used across multiple crates.

If that's correct, then I believe ORT could simply ignore Cargo.toml files that only define a virtual workspace, as I would expected that dependencies of "real" Cargo.toml files would anyway be implicitly correct / aligned when querying them with cargo metadata.

Phrased differently, based on your example there should be no need / value in calling cargo metadata --manifest-path /Cargo.toml, as we'll get all information from calling cargo metadata --manifest-path /crate_<X>/Cargo.toml for each X.

Would you agree to that? Do you have a nice & small example project with a virtual workspace plus expected results (in terms of dependencies that should be found) that we could test with?

@rtzoeller
Copy link
Contributor

Do you have a nice & small example project with a virtual workspace plus expected results (in terms of dependencies that should be found) that we could test with?

https://github.com/hyperium/http-body is a small example of a virtual workspace that fails today, with just two crates and a relatively small dependency tree.

The output of cargo tree --no-dedupe (which should show all dependencies and their versions) shows

Expand output
http-body v1.0.1 (/<parent-directories>/http-body/http-body)
├── bytes v1.9.0
└── http v1.2.0
    ├── bytes v1.9.0
    ├── fnv v1.0.7
    └── itoa v1.0.14

http-body-util v0.1.2 (/<parent-directories>/http-body/http-body-util)
├── bytes v1.9.0
├── futures-core v0.3.31
├── http v1.2.0
│   ├── bytes v1.9.0
│   ├── fnv v1.0.7
│   └── itoa v1.0.14
├── http-body v1.0.1 (/<parent-directories>/http-body/http-body)
│   ├── bytes v1.9.0
│   └── http v1.2.0
│       ├── bytes v1.9.0
│       ├── fnv v1.0.7
│       └── itoa v1.0.14
└── pin-project-lite v0.2.16
[dev-dependencies]
├── futures-util v0.3.31
│   ├── futures-core v0.3.31
│   ├── futures-task v0.3.31
│   ├── pin-project-lite v0.2.16
│   └── pin-utils v0.1.0
└── tokio v1.43.0
    ├── pin-project-lite v0.2.16
    └── tokio-macros v2.5.0 (proc-macro)
        ├── proc-macro2 v1.0.92
        │   └── unicode-ident v1.0.14
        ├── quote v1.0.38
        │   └── proc-macro2 v1.0.92
        │       └── unicode-ident v1.0.14
        └── syn v2.0.95
            ├── proc-macro2 v1.0.92
            │   └── unicode-ident v1.0.14
            ├── quote v1.0.38
            │   └── proc-macro2 v1.0.92
            │       └── unicode-ident v1.0.14
            └── unicode-ident v1.0.14

Phrased differently, based on your example there should be no need / value in calling cargo metadata --manifest-path /Cargo.toml, as we'll get all information from calling cargo metadata --manifest-path /crate_/Cargo.toml for each X.

I'm not entirely sure, to be honest. It seems like cargo metadata --manifest-path is returning nearly identical output for /Cargo.toml, /http-body/Cargo.toml, and /http-body-util/Cargo.toml in the above example, and I don't know how much parsing is done of the output vs. assuming all listed packages are valid dependencies.


The more I look at this, the more skeptical I become that ORT is even doing the correct thing for normal crates today. Having modified the http-body crate to not use a virtual workspace, ORT remains very insistent that addr2line is a dependency, and puts it in a NOTICE_DEFAULT file when creating a report.

addr2line shows up in the output of cargo metadata, but isn't present in cargo tree for either crate. From manually parsing the output of cargo metadata it appears that the theoretical dependency chain is

http-body-util
    has a dev-dependency on
tokio
    has a dependency on
backtrace
    has a dependency on
addr2line

BUT tokio's only reference to backtrace is as an optional dependency as part of an unstable feature which is not enabled by http-body-util.

This means ORT is listing sub-dependencies which are demonstrably not actually included by cargo build.

@sschuberth
Copy link
Member

Thanks for the analysis!

BUT tokio's only reference to backtrace is as an optional dependency as part of an unstable feature which is not enabled by http-body-util.

That's interesting as cargo metadata has "optional": false for tokio's dependency on backtrace:

    {
      "name": "backtrace",
      "source": "registry+https://github.com/rust-lang/crates.io-index",
      "req": "^0.3.58",
      "kind": null,
      "rename": null,
      "optional": false,
      "uses_default_features": true,
      "features": [],
      "target": "cfg(tokio_taskdump)",
      "registry": null
    },

This means ORT is listing sub-dependencies which are demonstrably not actually included by cargo build.

Are you aware of a cargo sub-command that would get us the right data? Or how to filter the output of cargo metadata?

@sschuberth
Copy link
Member

It seems like cargo metadata --manifest-path is returning nearly identical output for /Cargo.toml, /http-body/Cargo.toml, and /http-body-util/Cargo.toml in the above example,

But isn't that a good sign? That sounds to me like an indication that we could omit cargo metadata --manifest-path /Cargo.toml without loosing information.

and I don't know how much parsing is done of the output vs. assuming all listed packages are valid dependencies.

Parsing on ORT side you mean? ORT does very little extra parsing / interpretation but more or less just maps all cargo metadata output to ORT's data models.

@rtzoeller
Copy link
Contributor

That's interesting as cargo metadata has "optional": false for tokio's dependency on backtrace

I believe this is a weird interaction of "optional" and "target". AIUI if cfg(tokio_taskdump) was true, backtrace would be a required dependency. Without cfg(tokio_taskdump), backtrace isn't a required or optional dependency - it doesn't exist in the dependency tree at all.

@rtzoeller
Copy link
Contributor

It seems like cargo metadata --manifest-path is returning nearly identical output for /Cargo.toml, /http-body/Cargo.toml, and /http-body-util/Cargo.toml in the above example,

But isn't that a good sign? That sounds to me like an indication that we could omit cargo metadata --manifest-path /Cargo.toml without loosing information.

In this case I wasn't expecting them to be identical. 😄

Are you aware of a cargo sub-command that would get us the right data? Or how to filter the output of cargo metadata?

I'm not, but it looks like there is a cargo cyclonedx third-party plugin which produces what I think is the expected output. It might be worth looking at what they do.

I don't know if cargo cyclonedx includes dev dependencies by default, but I copied the tokio dependency in http-body-util to be a regular dependency and cargo cyclonedex correctly included tokio and its own dependencies (as listed by cargo tree --no-dedupe) without including backtrace or addr2line.

@rtzoeller
Copy link
Contributor

Running cargo cyclonedx -vvv does show they're using cargo metadata as their source of truth, so they must be doing some additional processing to get usable data.

Expand output
$ cargo cyclonedx -vvv
[2025-01-09T21:49:26Z INFO  cargo_cyclonedx] Using Cargo.toml manifest located at: /home/rzoeller/repos/http-body/Cargo.toml
[2025-01-09T21:49:26Z DEBUG cargo_cyclonedx] Found the Cargo.toml file at /home/rzoeller/repos/http-body/Cargo.toml
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] Running `cargo metadata` started
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] Running `cargo metadata` finished
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] SBOM generation started
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Processing the workspace /home/rzoeller/repos/http-body
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Processing the package path+file:///home/rzoeller/repos/http-body/http-body#1.0.1
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Adding all dependencies to SBOM
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z DEBUG cargo_cyclonedx::generator] Hash for package ID path+file:///home/rzoeller/repos/http-body/http-body#1.0.1 not found in Cargo.lock
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Processing the package path+file:///home/rzoeller/repos/http-body/http-body-util#0.1.2
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Adding all dependencies to SBOM
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z DEBUG cargo_cyclonedx::generator] Hash for package ID path+file:///home/rzoeller/repos/http-body/http-body#1.0.1 not found in Cargo.lock
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx::generator] Using license parser mode [Lax] for package [[email protected]]
[2025-01-09T21:49:26Z DEBUG cargo_cyclonedx::generator] Hash for package ID path+file:///home/rzoeller/repos/http-body/http-body-util#0.1.2 not found in Cargo.lock
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] SBOM generation finished
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] SBOM output started
[2025-01-09T21:49:26Z INFO  cargo_cyclonedx::generator] Outputting /home/rzoeller/repos/http-body/http-body/http-body.cdx.xml
[2025-01-09T21:49:26Z INFO  cargo_cyclonedx::generator] Outputting /home/rzoeller/repos/http-body/http-body-util/http-body-util.cdx.xml
[2025-01-09T21:49:26Z TRACE cargo_cyclonedx] SBOM output finished

@rtzoeller
Copy link
Contributor

But isn't that a good sign? That sounds to me like an indication that we could omit cargo metadata --manifest-path /Cargo.toml without loosing information.

I suppose for the sake of this bug, just ignoring the root /Cargo.toml would be sufficient for virtual workspaces.

Then there's the larger bug of "more crates than expected are included in Rust crate dependency graphs", which can likely be handled separately.

sschuberth added a commit that referenced this issue Jan 9, 2025
Virtual worspaces do not define anypackages and thus can simply be
filtered out without loosing any information.

Resolves #8874.

Signed-off-by: Sebastian Schuberth <[email protected]>
sschuberth added a commit that referenced this issue Jan 9, 2025
Virtual workspaces do not define any packages and thus can simply be
filtered out without loosing any information.

Resolves #8874.

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth
Copy link
Member

I suppose for the sake of this bug, just ignoring the root /Cargo.toml would be sufficient for virtual workspaces.

Agree, see #9716.

@sschuberth
Copy link
Member

Then there's the larger bug of "more crates than expected are included in Rust crate dependency graphs", which can likely be handled separately.

Agree as well, see #9717.

sschuberth added a commit that referenced this issue Jan 9, 2025
Virtual workspaces do not define any packages and thus can simply be
filtered out without loosing any information.

Resolves #8874.

Signed-off-by: Sebastian Schuberth <[email protected]>
sschuberth added a commit that referenced this issue Jan 10, 2025
Virtual workspaces do not define any packages and thus can simply be
filtered out without losing any information.

Resolves #8874.

Signed-off-by: Sebastian Schuberth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool help wanted An issue where third-party help is wanted on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants