Skip to content

Feature Request: Report files installed per package in OS extractors #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
samuelvl opened this issue Mar 19, 2025 · 4 comments
Open

Comments

@samuelvl
Copy link

samuelvl commented Mar 19, 2025

Currently the OS extractors do not report the files installed by each OS package. This feature would be valuable for mapping file accesses to the package that installed them, which is useful for security and forensic analysis.

For example, the os/apk extractor does not include package files in locations, only the APK index path (lib/apk/db/installed):

Locations: []string{input.Path},

However this list of files installed per package can be retrieved from the APK index as well: https://wiki.alpinelinux.org/wiki/Apk_spec

Similar tools like Trivy or Syft report this information.

@samuelvl
Copy link
Author

I'm happy to contribute to implementing this feature, let me know if you think this request makes sense

@erikvarga
Copy link
Collaborator

'cc @vpasdf
We had some performance concerns about this before - if every OS package added a lot of new file path strings we'd increase both the output size and the peak memory usage of the SCALIBR run which could be problematic in some resource constrained environments we're using the scanner in.

I think it should be fine to add this feature behind a flag that's disabled by default though.

@erikvarga
Copy link
Collaborator

Thanks for the interest in adding this feature! Feel free to send a PR our way.

@vpasdf
Copy link
Collaborator

vpasdf commented Apr 2, 2025

Is the file information automatically generated and contains all the files or is it humanly generated?

For DPKG that list of files is very long. It's usually more efficient to loop over those paths once, rather than storing all of them.

Behind a feature flag its fine to add this feature. This might cause significantly higher memory usage depending on the number of packages and files, which might also slow down scanning. I'd recommend not adding this to Locations but adding a new field in the metadata, bc in Location we want to store where we found the package, not what is all related to the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants