-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print manifest comment in single-file packages #36
Conversation
Once `au` goes public, there are going to be a _lot_ of files out in the wild called `au.hh` (and a bunch more called `au_noio.hh`). These files are all going to look very similar when people skim them, but they'll have myriad tiny variations. It's critically important to make these as easy as possible to distinguish! To nip this in the bud, we'll add a comment at the top of the file which provides a "manifest", clearly identifying both the version of code and the options with which the file was generated. We'll add a new `--version-id` argument to the script, which gives a human-readable, release-tag-based description. Right this moment, it gives: `0.2.0-1-g545a831-dirty` This means we are based on release `0.2.0`, with `1` extra commit, and the current unambiguous ID is a git (`g`) commit with abbreviated hash `545a831`. Since the worktree is dirty (after all, I'm in the _process_ of committing), it appends the word `dirty`. If we were exactly on a release, the commit hash would be omitted. Since we don't want to burden end users with generating this information, we will provide it if they omit it. Git version information is typically inaccessible to bazel. This makes it hard to include the manifest in our automatically generated single-file releases. To fix this, we use bazel's workspace status machinery [1], writing the ID to the `stable-status.txt` file. We then give the genrules access to this file. To do so, we need to set the `--workspace_status_command` for every build command in our `.bazelrc`. Typically, one would rather not do this for _every_ command. However, empirically, the burden was too small to be perceptible. Test Plan --------- - [x] Run `au-docs-serve`, and view the single-file releases. - [x] Run `make-single-file --units meters seconds | gview -`. [1] https://bazel.build/docs/user-manual#workspace-status
Output which showed up in `au-docs-serve`: ``` fatal: not a git repository (or any of the parent directories): .git ``` This doesn't matter because when this happens, we've provided an authoritative option anyway.
tools/bin/make-single-file
Outdated
@@ -21,7 +22,7 @@ def main(argv=None): | |||
""" | |||
args = parse_command_line_args(argv) | |||
files = parse_files(filenames=filenames(args=args)) | |||
print_unified_file(files) | |||
print_unified_file(files, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As tempting as it may be, I've always regretted passing args
in my own code. Passing this opaque structure through makes the code harder to understand. The only real way to understand what the function is doing with args
is to read its implementation. It also makes unit testing really annoying because you have to construct such an instance instead of just passing in the actual values you care about.
I highly recommend not doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly feel the same way... generally. However, in this particular case, the point of manifest()
is literally "make sure we use all the args", so I think args
is best.
That said, I'm concerned that we could miss handling some args. So I wrote a small wrapper that makes it easy to check that we have handled all of them: dd3b803. This revealed that we were not in fact handling the main_files
arg (which is now rarely used), so I added a section for it: 744deec. Finally, and much to my embarrassment, I noticed that I was lazily passing args
without due cause, so I fixed that instance, while making sure to use named parameter passing in the instances where I do want that: e969663.
@@ -0,0 +1,3 @@ | |||
#!/bin/bash | |||
|
|||
echo STABLE_GIT_ID $(git describe --always --dirty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me that making this information as "stable" is beneficial. Perhaps in a small repo like this it doesn't matter. But when you get a larger repo, it's generally less useful to have this be stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal in doing this was to force the stamped targets (i.e., //:au_hh
and //:au_noio_hh
) to rebuild every time the git description changes. I had thought that if I let the git description be "volatile", then these would not get rebuilt. Am I understanding correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your goal is exactly "rebuild any time anything at all changes", then what you have does work. It is, however, not super useful. E.g. every time you create a commit, everything gets "relinked/restamped". In a repo of non-trivial size, that time spent restamping/relinking adds up pretty quickly. In my experience, the more useful notion is "any time this target needs to get rebuilt, grab the latest git information". That means targets only get rebuilt when they actually need to be. You can then still manually force everything to get restamped with --embed_label
.
Even if you make this volatile, the information will never be wrong, but it might not mean exactly what you want it to. Basically, if you make it volatile, then the stamped git hash means "if you check out this hash, then you will be able to reproduce the corresponding artifact exactly." It does not mean "this artifact was built at exactly this hash".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, not important for a project of this size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I would like to fix this up to get it right, but it's not a top priority since this doesn't block release, and we're very far from having a repo big enough to notice this (if we even ever do).
At lower priority, therefore, I had a clarification question. When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?
My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True
targets to be directly affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?
stamped targets (and their reverse deps)
My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True targets to be directly affected.
That is correct.
The new check caught that we were ignoring this.
Only pass `args` when the contract is _literally "use all args"_, and in that case, _check_ that we do so. Otherwise, pass individual named elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have any more feedback, I'll address it in follow-on PR(s).
class CheckArgs(dict): | ||
def __init__(self, args): | ||
self.vars = vars(args) | ||
self.used = set() | ||
|
||
def are_all_used(self): | ||
return not set(self.vars.keys()).difference(self.used) | ||
|
||
def __getattr__(self, k): | ||
self.used.add(k) | ||
return self.vars[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we deal with #21, this would be a great candidate for a unit test. In fact, I used one to write it:
class TestUseAllArgs(unittest.TestCase):
def parse_args_a_b_c(self, argv):
parser = argparse.ArgumentParser()
parser.add_argument("--a", type=int)
parser.add_argument("--b", type=int)
parser.add_argument("--c", type=int)
return parser.parse_args(argv)
def test_detects_unused_arg(self):
args = CheckArgs(self.parse_args_a_b_c(argv=["--a", "1", "--c", "2"]))
self.assertFalse(args.are_all_used())
self.assertEqual(args.a, 1)
self.assertEqual(args.c, 2)
self.assertFalse(args.are_all_used())
self.assertIsNone(args.b)
self.assertTrue(args.are_all_used())
The reason I didn't land it was because I hadn't yet broken this out into (tested) python libraries, a python binary, and a thin wrapper. But I hope the existence of this test, even if only locally, adds some confidence that this thing does what it's supposed to. (I really do need to go ahead and just do #21...)
Once
au
goes public, there are going to be a lot of files out in thewild called
au.hh
(and a bunch more calledau_noio.hh
). These filesare all going to look very similar when people skim them, but they'll
have myriad tiny variations. It's critically important to make these as
easy as possible to distinguish!
To nip this in the bud, we'll add a comment at the top of the file
which provides a "manifest", clearly identifying both the version of
code and the options with which the file was generated. We'll add a new
--version-id
argument to the script, which gives a human-readable,release-tag-based description.
Right this moment, it gives:
0.2.0-1-g545a831-dirty
. This means weare based on release
0.2.0
, with1
extra commit, and the currentunambiguous ID is a git (
g
) commit with abbreviated hash545a831
.Since the worktree is dirty (after all, I'm in the process of
committing), it appends the word
dirty
. If we were exactly on arelease, the commit hash would be omitted. Since we don't want to
burden end users with generating this information, we will provide it if
they omit it.
Git version information is typically inaccessible to bazel. This makes
it hard to include the manifest in our automatically generated
single-file releases. To fix this, we use bazel's workspace status
machinery [1], writing the ID to the
stable-status.txt
file. We thengive the genrules access to this file.
To do so, we need to set the
--workspace_status_command
for everybuild command in our
.bazelrc
. Typically, one would rather not dothis for every command. However, empirically, the burden was too
small to be perceptible.
Fixes #30.
Test Plan
au-docs-serve
, and view the single-file releases.make-single-file --units meters seconds | gview -
.[1] https://bazel.build/docs/user-manual#workspace-status