Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize copy_file #565

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

fmeum
Copy link
Contributor

@fmeum fmeum commented Mar 19, 2025

  • ctx.actions.symlink can be used on all platforms and falls back to a copy if a symlink is unsupported. It is also heavily optimized, avoiding the need to hash the output file again.
  • Bazel never guarantees that an input to an action is staged as a non-symlink, so whether an output is a symlink or a hard copy only matters for top-level outputs consumed outside Bazel and handled by tools that don't follow symlinks by default, which should be extremely rare and could be worked around by explicitly setting allow_symlink to False in user-controlled code.
  • For file copy actions that do not go through ctx.actions.symlink, caching is extremely cheap since the CAS entry of the input will be reused as the CAS entry of the output. Allowing remote execution and caching enables BwoB for the copy, which can avoid downloads of both the input and the output file.

The same changes are not applied to copy_directory as source directories are not officially supported by Bazel and any kind of change could cause subtle incorrectness.

Fixes #562

@fmeum
Copy link
Contributor Author

fmeum commented Mar 19, 2025

@ulfjack @tjgq Could you review this?

@fmeum fmeum force-pushed the drop-exec-requirements branch 3 times, most recently from 8e6d9e9 to 3127c3f Compare March 19, 2025 11:29
@fmeum
Copy link
Contributor Author

fmeum commented Mar 19, 2025

FYI @kormide, I think that most of the exec requirements are actually harmful for copy_file. Do you see any downside to dropping them as long as they are kept on copy_directory?

* `ctx.actions.symlink` can be used on all platforms and falls back to a copy if a symlink is unsupported. It is also heavily optimized, avoiding the need to hash the output file again.
* Bazel never guarantees that an input to an action is staged as a non-symlink, so whether an output is a symlink or a hard copy only matters for top-level outputs consumed outside Bazel and handled by tools that don't follow symlinks by default, which should be extremely rare and could be worked around by explicitly setting `allow_symlink` to `False` in user-controlled code.
* For file copy actions that do not go through `ctx.actions.symlink`, caching is extremely cheap since the CAS entry of the input will be reused as the CAS entry of the output. Allowing remote execution and caching enables BwoB for the copy, which can avoid downloads of both the input and the output file.

The same changes are not applied to `copy_directory` as source directories are not officially supported by Bazel and any kind of change could cause subtle incorrectness.
@fmeum fmeum force-pushed the drop-exec-requirements branch from 3127c3f to 0ebea4d Compare March 19, 2025 11:35
@ulfjack
Copy link

ulfjack commented Mar 20, 2025

I wonder if it would be helpful to have a mechanism to pass in execution_requirements manually?

@fmeum
Copy link
Contributor Author

fmeum commented Mar 20, 2025

I wonder if it would be helpful to have a mechanism to pass in execution_requirements manually?

You mean on a per-target level (per-mnemonic level is possible with --modify_execution_info)? I think that exec_properties can be used for this (with empty values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copy_file should *never* set no-remote
2 participants