Skip to content

Implement bst inspect subcommand #2035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

kevinschoon
Copy link

This adds a new subcommand to bst called inspect which dumps structured data to stdout of a given project. The data sent to stdout can be either JSON or YAML with JSON being the default.

Having this command available will make writing external tools which need to inspect the state of a buildstream project more easy since JSON and YAML encoding are widely supported. This command may easily be extended in the future to support other inspectable elements related to bst project which isn't present in this first commit.



def _dump_project(self):
# TODO: What else do we want here?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take note of this, I'm unsure what other items we should include in the project output.

project = str(datafiles)
result = cli.run(project=project, silent=True, args=["inspect", "*.bst"])
result.assert_success()
json.loads(result.output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perform an assert between the result, and an 'expected' JSON result stored in the project directory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into doing this but because the inspect format appears to contain some localized information to the caller, for example:

      "sources": [
        {
          "kind": "remote",
          "url": "file:///home/kevin/repos/external/github.com/apache/buildstream/tests/frontend/source-fetch/files/bananas",
          "medium": "remote-file",
          "version-type": "sha256",
          "version": "e49295702f7da8670778e9b95a281b72b41b31cb16afa376034b45f59a18ea3f"
        }
      ]

It would not be possible to store the entire expected output of the inspect command. Instead of doing that I instead just tested for the existence of a few keys. If you think we should still test more though I'm open to trying a different approach.

project = str(datafiles)
result = cli.run(project=project, silent=True, args=["inspect", "--state", "--deps", "all"])
result.assert_success()
json.loads(result.output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Comment on lines 41 to 45
assert(output["project"]["name"] == "test")
element = _element_by_name(output["elements"], "import-bin.bst")
source = element["sources"][0]
assert(source["kind"] == "local")
assert(source["url"] == "files/bin-files")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: assert should report the reason if the assert fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have cleaned up and parameterized the tests a bit in my last commit.

@harrysarson
Copy link
Contributor

This PR allows you to see the version of a nested junction element. Currently this is only possible by hacking around with .bst/staged-junctions.

With this PR you can do something like

bst inspect xxxx.bst:freedesktop-sdk.bst | jq '.elements[0].sources[0]["version-guess"]'

Which is really nice and will be very useful.

@abderrahim
Copy link
Contributor

This PR allows you to see the version of a nested junction element. Currently this is only possible by hacking around with .bst/staged-junctions.

With this PR you can do something like

bst inspect xxxx.bst:freedesktop-sdk.bst | jq '.elements[0].sources[0]["version-guess"]'

You should be able to get the source information already using bst show

bst show --format %{source-info} xxxx.bst:freedesktop-sdk.bst

Unless you're referring to a bug with the above command?

@harrysarson
Copy link
Contributor

Ah --format %{source-info} is new. And also works!

sources: list[dict[str, str]]

@dataclass
class _ProjectOutput:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of project related data missing here, such as which plugins have been loaded and their provenances, which cache servers were accessed, the provenance string of the project itself (i.e. which junction it was loaded from and where).

All of which might not be reported by default.

I think we should cover everything that is reported in LogLine.print_heading() - even if some of this data may be conditional (i.e. user config and access to caches may dictate which artifact/source remotes are active).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of project related data missing here,

such as which plugins have been loaded and their provenances,

Plugins are now exposed, example:

...
 "plugins": [
	{
	  "name": "stack",
	  "description": "core plugin",
	  "plugin_type": "element"
	},
	{
	  "name": "import",
	  "description": "core plugin",
	  "plugin_type": "element"
	},
  ...
]

which cache servers were accessed

All remote state including cache servers it not included in inspect output currently.

the provenance string of the project itself (i.e. which junction it was loaded from and where).

Each project loaded shows the junction from where it comes, example:

...
 "junction": "freedesktop-sdk.bst:plugins/buildstream-plugins.bst",
...

@@ -302,6 +304,9 @@ def initialized(self, *, session_name=None):
#
self.stream.set_project(self.project)

# Initialize the inspector
self.inspector = Inspector(self.stream, self.project, self.context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed for anything outside of the bst inspect command, as such I'd rather this be created directly in _frontend/cli.py within the app.initialized() block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved from app.py -> cli.py.

if _artifact.cached():
artifact = {
"files": artifact.get_files(),
"digest": artifact_files._get_digest(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks untested, the artifact_files variable appears to be undeclared here.

Also artifact.get_files() reports a CasBasedDirectory, perhaps this patch is expecting the __iter__ implementation to serialize into a list of paths ?

Not sure what's going on overall in this code block, but if we want to report files, we need more structured data here, such that we can report file types, symlink targets, file sizes, permissions, etc.

That said; I think it is acceptable to not support reporting artifact files for an initial version of bst inspect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've eliminated all of the artifact inspection for this first version.

)
@click.argument("elements", nargs=-1, type=click.Path(readable=False))
@click.pass_obj
def inspect(app, elements, state, deps):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with bst show, we should also support support the --except options here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added support for --except in the same way bst show works.


state = self._read_state(element).value

# BUG: Due to the assersion within .get_artifact this will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the correct way to go about this is Element._load_artifact().

I think we can avoid pull for now and consider whether bst inspect pulls data from remotes as a possible followup - and strict comes from the Context object here which will be resolved by the toplevel options (defaults to True unless bst --no-strict inspect .... is specified).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure I fully understand this. Are you saying we should avoid pulling the state entirely with bst inspect (for this PR anyway)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are misunderstanding is that there is a difference between the local and remote caches.

loading artifacts from the local CAS is expensive, so is checking the cached state of artifacts. That is why we avoid loading cached state where possible.

In this comment, we’re talking about loading the artifact to check its files and CAS digest, we can do that if the artifact is in the local cache.

Some bst commands also allow pulling from remotes, in case the artifacts were built in CI or pushed by another user, it can be useful, but I don’t think having bst inspect support downloading artifacts from remotes is important for an initial implementation.

For this case, you should follow the code paths for bst artifact list-contents in order to see how artifact loading is done… and follow the bst show code path for displaying %{artifact-cas-digest} to observe how to get the digest.

It is acceptable to just omit the data for these in the case that the local artifact is not cached and is thus unavailable.

@gtristan
Copy link
Contributor

Overall this is coming along nicely... some things which come to mind:

  • We need to consider how this will eventually evolve to support project-less use
    • I.e. there is a certain amount of information which is serialized into artifact metadata, that is extensible and can grow, and bst inspect should (at least eventually) allow inspection of artifact dependency chains loaded from a remote cache, without needing to have a local checkout of the project
    • This means we need to consider whether we intend to support this directly in bst inspect, or if we intend to have that as a separate addition as bst artifact inspect... perhaps the latter seems more sensible
  • We still do not have anything reporting of public data here as far as I can see
    • Public data is special as it is mutable
    • We need to consider what is sensible to report here
    • Initially, my feeling is that we should support reporting both original public data and mutated public data separately
      • This might require some added features to the core
      • Displaying original public data is only possible if we have a local project (not loading from a remote artifact)
      • Displaying mutated public data is only possible if the element has a locally cached artifact
  • Looks like this is still missing the ability to show artifact build logs
    • Perhaps not necessary for an initial implementation but shouldn't be too hard to include
    • Definitely should not be included in output by default (only if specified on the command line --fields)

@gtristan
Copy link
Contributor

gtristan commented Jul 17, 2025

Suggestion: I think we should have a whole ”artifact”: { … } block which contains only things which are loaded from the artifact, and omit it entirely in the case that the artifact is not present.

This can include:

  • the public data as found in the artifact, which may not be the same as the public data loaded from the bst files before building
  • The cas digest(s)
  • The file metadata
  • The build logs
  • The failure state (whether it is a cached failed build)
  • whether the artifact has a cached build tree

I think this nested structure disambiguates things, solves the public data concern nicely and is all around nicely structured.

Also… I think that given we present a lot of data in the output json, I think maybe we should not do the %{state} thing at all, and leave that only to bst show. Rationale being that the “state” concept is really very user facing, and it can anyway be easily computed based on the json reported by bst inspect

This adds a new subcommand to bst called `inspect` which dumps structured data
to stdout of a given project. The data sent to stdout can be either JSON or
YAML with JSON being the default.

Having this command available will make writing external tools which need to
inspect the state of a buildstream project more easy since JSON and YAML
encoding are widely supported. This command may easily be extended in the
future to support other inspectable elements related to bst project which
isn't present in this first commit.
@kevinschoon kevinschoon force-pushed the ks/bst-inspect-command branch from 38e085a to a224502 Compare July 21, 2025 09:11
@kevinschoon
Copy link
Author

Also… I think that given we present a lot of data in the output json, I think maybe we should not do the %{state} thing at all, and leave that only to bst show. Rationale being that the “state” concept is really very user facing, and it can anyway be easily computed based on the json reported by bst inspect

I have eliminated state entirely from the output.

@kevinschoon kevinschoon force-pushed the ks/bst-inspect-command branch from 09e7542 to 8d32ba3 Compare July 21, 2025 12:27
@kevinschoon kevinschoon force-pushed the ks/bst-inspect-command branch from 4c6752d to c72aa10 Compare July 21, 2025 12:56
@kevinschoon
Copy link
Author

Here is an example of the current state of the inspect output against a test project.

Any insight into specifically what types of fields we would like to include in this output as well as general feedback would be appreciated.

{
  "project": [
    {
      "duplicates": [],
      "declarations": [],
      "config": {
        "name": "test",
        "directory": "/home/kevin/repos/external/github.com/apache/buildstream/tests/frontend/inspect",
        "aliases": {
          "example": "https://example.org/"
        },
        "element_overrides": {},
        "source_overrides": {},
        "plugins": [
          {
            "name": "stack",
            "description": "core plugin",
            "plugin_type": "element"
          },
          {
            "name": "import",
            "description": "core plugin",
            "plugin_type": "element"
          },
          {
            "name": "local",
            "description": "core plugin",
            "plugin_type": "source"
          },
          {
            "name": "remote",
            "description": "core plugin",
            "plugin_type": "source"
          },
          {
            "name": "tar",
            "description": "core plugin",
            "plugin_type": "source"
          },
          {
            "name": "local",
            "description": "core plugin",
            "plugin_type": "source-mirror"
          },
          {
            "name": "remote",
            "description": "core plugin",
            "plugin_type": "source-mirror"
          },
          {
            "name": "tar",
            "description": "core plugin",
            "plugin_type": "source-mirror"
          }
        ]
      }
    }
  ],
  "user_config": {
    "configuration": "/home/kevin/.config/buildstream.conf",
    "cache_directory": "/home/kevin/.cache/buildstream",
    "log_directory": "/home/kevin/.cache/buildstream/logs",
    "source_directory": "/home/kevin/.cache/buildstream/sources",
    "build_directory": "/home/kevin/.cache/buildstream/build",
    "source_mirrors": "/home/kevin/.cache/buildstream/sources",
    "build_area": "/home/kevin/.cache/buildstream/build",
    "strict_build_plan": "/home/kevin/.config/buildstream.conf",
    "maximum_fetch_tasks": 10,
    "maximum_build_tasks": 4,
    "maximum_push_tasks": 4,
    "maximum_network_retries": 2
  },
  "elements": [
    {
      "name": "import-local-files.bst",
      "description": "",
      "environment": {
        "PATH": "/usr/bin:/bin:/usr/sbin:/sbin",
        "SHELL": "/bin/sh",
        "TERM": "dumb",
        "USER": "tomjon",
        "USERNAME": "tomjon",
        "LOGNAME": "tomjon",
        "LC_ALL": "C",
        "HOME": "/tmp",
        "TZ": "UTC",
        "SOURCE_DATE_EPOCH": "1321009871"
      },
      "variables": {
        "prefix": "/usr",
        "exec_prefix": "/usr",
        "bindir": "/usr/bin",
        "sbindir": "/usr/sbin",
        "libexecdir": "/usr/libexec",
        "datadir": "/usr/share",
        "sysconfdir": "/etc",
        "sharedstatedir": "/usr/com",
        "localstatedir": "/var",
        "lib": "lib",
        "libdir": "/usr/lib",
        "debugdir": "/usr/lib/debug",
        "includedir": "/usr/include",
        "docdir": "/usr/share/doc",
        "infodir": "/usr/share/info",
        "mandir": "/usr/share/man",
        "build-root": "/buildstream/test/import-local-files.bst",
        "conf-root": ".",
        "install-root": "/buildstream-install",
        "strip-binaries": "",
        "schema": "https",
        "project-name": "test",
        "max-jobs": "8",
        "element-name": "import-local-files.bst"
      },
      "dependencies": [],
      "build_dependencies": [],
      "runtime_dependencies": [],
      "sources": [
        {
          "kind": "local",
          "url": "files",
          "medium": "local",
          "version-type": "cas-digest",
          "version": "d8c20623d7160ffe2fd69bd03b3ad7b24a6d1dfd7c96f3ed6c8deb3f268d2d64/85"
        }
      ]
    },
    {
      "name": "import-remote-files.bst",
      "description": "",
      "environment": {
        "PATH": "/usr/bin:/bin:/usr/sbin:/sbin",
        "SHELL": "/bin/sh",
        "TERM": "dumb",
        "USER": "tomjon",
        "USERNAME": "tomjon",
        "LOGNAME": "tomjon",
        "LC_ALL": "C",
        "HOME": "/tmp",
        "TZ": "UTC",
        "SOURCE_DATE_EPOCH": "1321009871"
      },
      "variables": {
        "prefix": "/usr",
        "exec_prefix": "/usr",
        "bindir": "/usr/bin",
        "sbindir": "/usr/sbin",
        "libexecdir": "/usr/libexec",
        "datadir": "/usr/share",
        "sysconfdir": "/etc",
        "sharedstatedir": "/usr/com",
        "localstatedir": "/var",
        "lib": "lib",
        "libdir": "/usr/lib",
        "debugdir": "/usr/lib/debug",
        "includedir": "/usr/include",
        "docdir": "/usr/share/doc",
        "infodir": "/usr/share/info",
        "mandir": "/usr/share/man",
        "build-root": "/buildstream/test/import-remote-files.bst",
        "conf-root": ".",
        "install-root": "/buildstream-install",
        "strip-binaries": "",
        "schema": "https",
        "project-name": "test",
        "max-jobs": "8",
        "element-name": "import-remote-files.bst"
      },
      "dependencies": [],
      "build_dependencies": [],
      "runtime_dependencies": [],
      "sources": [
        {
          "kind": "remote",
          "url": "https://example.org/foo.bar.bin",
          "medium": "remote-file",
          "version-type": "sha256",
          "version": "d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1"
        },
        {
          "kind": "tar",
          "url": "https://example.org/baz.qux.tar.gz",
          "medium": "remote-file",
          "version-type": "sha256",
          "version": "d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1"
        }
      ]
    },
    {
      "name": "target.bst",
      "description": "Main stack target for the bst build test",
      "environment": {
        "PATH": "/usr/bin:/bin:/usr/sbin:/sbin",
        "SHELL": "/bin/sh",
        "TERM": "dumb",
        "USER": "tomjon",
        "USERNAME": "tomjon",
        "LOGNAME": "tomjon",
        "LC_ALL": "C",
        "HOME": "/tmp",
        "TZ": "UTC",
        "SOURCE_DATE_EPOCH": "1321009871"
      },
      "variables": {
        "prefix": "/usr",
        "exec_prefix": "/usr",
        "bindir": "/usr/bin",
        "sbindir": "/usr/sbin",
        "libexecdir": "/usr/libexec",
        "datadir": "/usr/share",
        "sysconfdir": "/etc",
        "sharedstatedir": "/usr/com",
        "localstatedir": "/var",
        "lib": "lib",
        "libdir": "/usr/lib",
        "debugdir": "/usr/lib/debug",
        "includedir": "/usr/include",
        "docdir": "/usr/share/doc",
        "infodir": "/usr/share/info",
        "mandir": "/usr/share/man",
        "build-root": "/buildstream/test/target.bst",
        "conf-root": ".",
        "install-root": "/buildstream-install",
        "strip-binaries": "",
        "schema": "https",
        "project-name": "test",
        "max-jobs": "8",
        "element-name": "target.bst"
      },
      "dependencies": [
        "import-local-files.bst",
        "import-remote-files.bst"
      ],
      "build_dependencies": [
        "import-local-files.bst",
        "import-remote-files.bst"
      ],
      "runtime_dependencies": [
        "import-local-files.bst",
        "import-remote-files.bst"
      ],
      "sources": []
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants