Skip to content

Add the digest-environment dependency configuration #2036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

abderrahim
Copy link
Contributor

This allows setting an environment variable inside the sandbox to the CAS digest of one or more dependencies.

This us useful when communicating with the nested remote execution endpoint, but could also be used independently.

This PR replaced #1991.

def stage(self, sandbox):
# Setup environment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this here because buildstream wouldn't let me stage outside of stage().

It might be better to try to work around that limitation rather than doing this virtual staging here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the overlap collector, btw. The staging functions will refuse to work if there is no active overlap collector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up the environment in stage() is a problem at least for bst shell --build --use-buildtree, which does not invoke Element.stage() but still requires environment variables to be set.

Draft fix for the overlap collector issue and moving environment setup back to configure_sandbox(): https://github.com/apache/buildstream/commits/jbilleter/digest-environment/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the fix to this branch and it seems to work well for normal builds. However, it still doesn't work for bst shell --build --use-buildtree (see failing test case) as that skips the loading of dependency artifacts. Always requiring build dependencies to be available for bst shell --build --use-buildtree would fix the test case but I suspect it would break buildtree shells for artifacts. Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that we do actually cache the environment variables in the artifact metadata, it seems to me that we should be relying on that in bst shell --build --use-buildtree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's part of the low diversity metadata. I'll have to check whether this would allow us to completely skip configure_sandbox() for the buildtree shell.

However, even if that works, there is still one potential issue, which is that this would not guarantee that the directory tree referenced by the digest environment variable is available in the local cache, as the artifact doesn't reference that digest (except via that environment variable but BuildStream core is not aware of that). Maybe we can add that to the Artifact proto as well, but this may require additional code and API in the core for use by BuildElement. Or do you have another suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, even if that works, there is still one potential issue, which is that this would not guarantee that the directory tree referenced by the digest environment variable is available in the local cache, as the artifact doesn't reference that digest (except via that environment variable but BuildStream core is not aware of that). Maybe we can add that to the Artifact proto as well, but this may require additional code and API in the core for use by BuildElement. Or do you have another suggestion?

We are interestingly dancing on the line of sanity here.

  • The assumption with bst shell --build --use-buildtree is that you have everything you need to reproduce the (probably failed) build locally in order to debug it.
  • The assumption with exporting the local casd socket to the build environment, and moreso, remoting out the builds to a different sandbox (whether it is on the same machine or not)... is that you have your hands on some sensitive things, you know what you are doing, and you risk corrupting your build and cache with some non-determinism

The above two are not necessarily at odds with eachother, but it could be fair to say that if you are doing this wild stuff, you are out of bounds for having support for debugging remotely failed builds.

I think that I would be satisfied with either of the following:

  • The officially supported and documented usage of recc is to use it locally in your sandbox, exercise the local compiler in the same sandbox, and use only the caching nature of recc to store/lookup objects in the CAS
    • This would "just work" given that we use the same environment variables from cached artifacts
  • The officially supported and documented usage of recc is to selectively stage the compiler, source code and required header files to an adjacent sandbox on the same buildhost via the local buildbox-casd feature, and the artifact proto is extended to capture a list of digests which might be used for weird features like this one

I.e., lets support something, and if you are doing something outside of what is supported, you might be on your own and your build shells for debugging remotely failed artifacts might not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to recommend the latter, i.e., use the remote execution protocol even for local execution, as the former (cache-only) is much easier to get wrong and get incorrect cache hits due to incomplete cache keys.

subsandbox = sandbox.create_subsandbox()
self.stage_dependency_artifacts(subsandbox, element_list)
digest = subsandbox.get_virtual_directory()._get_digest()
env[digest_variable] = "{}/{}".format(digest.hash, digest.size_bytes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that these environment variables don't end up in the build logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an improvement to build environment logging.

# Add virtual directories for subsandboxes
for subsandbox in self._get_subsandboxes():
vdir = subsandbox.get_virtual_directory()
root_digests.append(vdir._get_digest())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have an option to only upload these conditionally. It think it's not worth the trouble, but wanted to have a second opinion.

for digest_variable in sorted_envs:
element_list = [element for element, _ in self.__digest_environment[digest_variable]]
subsandbox = sandbox.create_subsandbox()
self.stage_dependency_artifacts(subsandbox, element_list)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtristan was worried that the CAS digest might depend on the order the elements are declared in the .bst file

See #1991 (comment)

abderrahim and others added 8 commits July 25, 2025 10:35
This allows an element to use a secondary sandbox for manipulating artifacts
that doesn't affect the build
This allows setting an environment variable inside the sandbox to the CAS
digest of one or more dependencies.

Co-authored by: Adrien Plazas <[email protected]>
The subsandboxes can be used to extract a CAS digest that could be used for
nested remote execution, and thus need to be available in the remote execution
CAS.
With the introduction of subsandboxes, a single overlap collector is no
longer sufficient.
`bst shell --build --use-buildtree` does not invoke `Element.stage()`
but still requires environment variables to be set.
Elements can configure additional environment variables in
`configure_sandbox()`, which is used by `BuildElement` to support
`digest-environment`. Change the build environment logging to log the
complete environment configured in the sandbox.
@juergbi juergbi force-pushed the abderrahim/digest-environment branch from 4f716b7 to 3ed3ac8 Compare July 25, 2025 15:34
@juergbi juergbi force-pushed the abderrahim/digest-environment branch from 3ed3ac8 to 973ac1c Compare July 25, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants