-
Notifications
You must be signed in to change notification settings - Fork 35
Add the digest-environment dependency configuration #2036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ba403b8
to
4f716b7
Compare
def stage(self, sandbox): | ||
# Setup environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this here because buildstream wouldn't let me stage outside of stage()
.
It might be better to try to work around that limitation rather than doing this virtual staging here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the overlap collector, btw. The staging functions will refuse to work if there is no active overlap collector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting up the environment in stage()
is a problem at least for bst shell --build --use-buildtree
, which does not invoke Element.stage()
but still requires environment variables to be set.
Draft fix for the overlap collector issue and moving environment setup back to configure_sandbox()
: https://github.com/apache/buildstream/commits/jbilleter/digest-environment/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed the fix to this branch and it seems to work well for normal builds. However, it still doesn't work for bst shell --build --use-buildtree
(see failing test case) as that skips the loading of dependency artifacts. Always requiring build dependencies to be available for bst shell --build --use-buildtree
would fix the test case but I suspect it would break buildtree shells for artifacts. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that we do actually cache the environment variables in the artifact metadata, it seems to me that we should be relying on that in bst shell --build --use-buildtree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's part of the low diversity metadata. I'll have to check whether this would allow us to completely skip configure_sandbox()
for the buildtree shell.
However, even if that works, there is still one potential issue, which is that this would not guarantee that the directory tree referenced by the digest environment variable is available in the local cache, as the artifact doesn't reference that digest (except via that environment variable but BuildStream core is not aware of that). Maybe we can add that to the Artifact proto as well, but this may require additional code and API in the core for use by BuildElement
. Or do you have another suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, even if that works, there is still one potential issue, which is that this would not guarantee that the directory tree referenced by the digest environment variable is available in the local cache, as the artifact doesn't reference that digest (except via that environment variable but BuildStream core is not aware of that). Maybe we can add that to the Artifact proto as well, but this may require additional code and API in the core for use by BuildElement. Or do you have another suggestion?
We are interestingly dancing on the line of sanity here.
- The assumption with
bst shell --build --use-buildtree
is that you have everything you need to reproduce the (probably failed) build locally in order to debug it. - The assumption with exporting the local casd socket to the build environment, and moreso, remoting out the builds to a different sandbox (whether it is on the same machine or not)... is that you have your hands on some sensitive things, you know what you are doing, and you risk corrupting your build and cache with some non-determinism
The above two are not necessarily at odds with eachother, but it could be fair to say that if you are doing this wild stuff, you are out of bounds for having support for debugging remotely failed builds.
I think that I would be satisfied with either of the following:
- The officially supported and documented usage of
recc
is to use it locally in your sandbox, exercise the local compiler in the same sandbox, and use only the caching nature ofrecc
to store/lookup objects in the CAS- This would "just work" given that we use the same environment variables from cached artifacts
- The officially supported and documented usage of
recc
is to selectively stage the compiler, source code and required header files to an adjacent sandbox on the same buildhost via the localbuildbox-casd
feature, and the artifact proto is extended to capture a list of digests which might be used for weird features like this one
I.e., lets support something, and if you are doing something outside of what is supported, you might be on your own and your build shells for debugging remotely failed artifacts might not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to recommend the latter, i.e., use the remote execution protocol even for local execution, as the former (cache-only) is much easier to get wrong and get incorrect cache hits due to incomplete cache keys.
subsandbox = sandbox.create_subsandbox() | ||
self.stage_dependency_artifacts(subsandbox, element_list) | ||
digest = subsandbox.get_virtual_directory()._get_digest() | ||
env[digest_variable] = "{}/{}".format(digest.hash, digest.size_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed that these environment variables don't end up in the build logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an improvement to build environment logging.
# Add virtual directories for subsandboxes | ||
for subsandbox in self._get_subsandboxes(): | ||
vdir = subsandbox.get_virtual_directory() | ||
root_digests.append(vdir._get_digest()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have an option to only upload these conditionally. It think it's not worth the trouble, but wanted to have a second opinion.
src/buildstream/buildelement.py
Outdated
for digest_variable in sorted_envs: | ||
element_list = [element for element, _ in self.__digest_environment[digest_variable]] | ||
subsandbox = sandbox.create_subsandbox() | ||
self.stage_dependency_artifacts(subsandbox, element_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtristan was worried that the CAS digest might depend on the order the elements are declared in the .bst file
See #1991 (comment)
This allows an element to use a secondary sandbox for manipulating artifacts that doesn't affect the build
This allows setting an environment variable inside the sandbox to the CAS digest of one or more dependencies. Co-authored by: Adrien Plazas <[email protected]>
The subsandboxes can be used to extract a CAS digest that could be used for nested remote execution, and thus need to be available in the remote execution CAS.
With the introduction of subsandboxes, a single overlap collector is no longer sufficient.
`bst shell --build --use-buildtree` does not invoke `Element.stage()` but still requires environment variables to be set.
Elements can configure additional environment variables in `configure_sandbox()`, which is used by `BuildElement` to support `digest-environment`. Change the build environment logging to log the complete environment configured in the sandbox.
4f716b7
to
3ed3ac8
Compare
3ed3ac8
to
973ac1c
Compare
This allows setting an environment variable inside the sandbox to the CAS digest of one or more dependencies.
This us useful when communicating with the nested remote execution endpoint, but could also be used independently.
This PR replaced #1991.