Print manifest comment in single-file packages #36

chiphogg · 2022-12-11T19:58:57Z

Once au goes public, there are going to be a lot of files out in the
wild called au.hh (and a bunch more called au_noio.hh). These files
are all going to look very similar when people skim them, but they'll
have myriad tiny variations. It's critically important to make these as
easy as possible to distinguish!

To nip this in the bud, we'll add a comment at the top of the file
which provides a "manifest", clearly identifying both the version of
code and the options with which the file was generated. We'll add a new
--version-id argument to the script, which gives a human-readable,
release-tag-based description.

Right this moment, it gives: 0.2.0-1-g545a831-dirty. This means we
are based on release 0.2.0, with 1 extra commit, and the current
unambiguous ID is a git (g) commit with abbreviated hash 545a831.
Since the worktree is dirty (after all, I'm in the process of
committing), it appends the word dirty. If we were exactly on a
release, the commit hash would be omitted. Since we don't want to
burden end users with generating this information, we will provide it if
they omit it.

Git version information is typically inaccessible to bazel. This makes
it hard to include the manifest in our automatically generated
single-file releases. To fix this, we use bazel's workspace status
machinery [1], writing the ID to the stable-status.txt file. We then
give the genrules access to this file.

To do so, we need to set the --workspace_status_command for every
build command in our .bazelrc. Typically, one would rather not do
this for every command. However, empirically, the burden was too
small to be perceptible.

Fixes #30.

Test Plan

Run au-docs-serve, and view the single-file releases.
Run make-single-file --units meters seconds | gview -.

[1] https://bazel.build/docs/user-manual#workspace-status

Once `au` goes public, there are going to be a _lot_ of files out in the wild called `au.hh` (and a bunch more called `au_noio.hh`). These files are all going to look very similar when people skim them, but they'll have myriad tiny variations. It's critically important to make these as easy as possible to distinguish! To nip this in the bud, we'll add a comment at the top of the file which provides a "manifest", clearly identifying both the version of code and the options with which the file was generated. We'll add a new `--version-id` argument to the script, which gives a human-readable, release-tag-based description. Right this moment, it gives: `0.2.0-1-g545a831-dirty` This means we are based on release `0.2.0`, with `1` extra commit, and the current unambiguous ID is a git (`g`) commit with abbreviated hash `545a831`. Since the worktree is dirty (after all, I'm in the _process_ of committing), it appends the word `dirty`. If we were exactly on a release, the commit hash would be omitted. Since we don't want to burden end users with generating this information, we will provide it if they omit it. Git version information is typically inaccessible to bazel. This makes it hard to include the manifest in our automatically generated single-file releases. To fix this, we use bazel's workspace status machinery [1], writing the ID to the `stable-status.txt` file. We then give the genrules access to this file. To do so, we need to set the `--workspace_status_command` for every build command in our `.bazelrc`. Typically, one would rather not do this for _every_ command. However, empirically, the burden was too small to be perceptible. Test Plan --------- - [x] Run `au-docs-serve`, and view the single-file releases. - [x] Run `make-single-file --units meters seconds | gview -`. [1] https://bazel.build/docs/user-manual#workspace-status

Output which showed up in `au-docs-serve`: ``` fatal: not a git repository (or any of the parent directories): .git ``` This doesn't matter because when this happens, we've provided an authoritative option anyway.

tools/bin/make-single-file

philsc · 2022-12-12T19:42:02Z

tools/bin/make-single-file

@@ -21,7 +22,7 @@ def main(argv=None):
    """
    args = parse_command_line_args(argv)
    files = parse_files(filenames=filenames(args=args))
-    print_unified_file(files)
+    print_unified_file(files, args)


As tempting as it may be, I've always regretted passing args in my own code. Passing this opaque structure through makes the code harder to understand. The only real way to understand what the function is doing with args is to read its implementation. It also makes unit testing really annoying because you have to construct such an instance instead of just passing in the actual values you care about.

I highly recommend not doing this.

I strongly feel the same way... generally. However, in this particular case, the point of manifest() is literally "make sure we use all the args", so I think args is best.

That said, I'm concerned that we could miss handling some args. So I wrote a small wrapper that makes it easy to check that we have handled all of them: dd3b803. This revealed that we were not in fact handling the main_files arg (which is now rarely used), so I added a section for it: 744deec. Finally, and much to my embarrassment, I noticed that I was lazily passing args without due cause, so I fixed that instance, while making sure to use named parameter passing in the instances where I do want that: e969663.

philsc · 2022-12-12T19:45:56Z

tools/bin/workspace_status.sh

@@ -0,0 +1,3 @@
+#!/bin/bash
+
+echo STABLE_GIT_ID $(git describe --always --dirty)


It's not obvious to me that making this information as "stable" is beneficial. Perhaps in a small repo like this it doesn't matter. But when you get a larger repo, it's generally less useful to have this be stable.

My goal in doing this was to force the stamped targets (i.e., //:au_hh and //:au_noio_hh) to rebuild every time the git description changes. I had thought that if I let the git description be "volatile", then these would not get rebuilt. Am I understanding correctly?

If your goal is exactly "rebuild any time anything at all changes", then what you have does work. It is, however, not super useful. E.g. every time you create a commit, everything gets "relinked/restamped". In a repo of non-trivial size, that time spent restamping/relinking adds up pretty quickly. In my experience, the more useful notion is "any time this target needs to get rebuilt, grab the latest git information". That means targets only get rebuilt when they actually need to be. You can then still manually force everything to get restamped with --embed_label.

Even if you make this volatile, the information will never be wrong, but it might not mean exactly what you want it to. Basically, if you make it volatile, then the stamped git hash means "if you check out this hash, then you will be able to reproduce the corresponding artifact exactly." It does not mean "this artifact was built at exactly this hash".

Either way, not important for a project of this size.

Thanks! I would like to fix this up to get it right, but it's not a top priority since this doesn't block release, and we're very far from having a repo big enough to notice this (if we even ever do).

At lower priority, therefore, I had a clarification question. When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?

My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True targets to be directly affected.

When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?

stamped targets (and their reverse deps)

My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True targets to be directly affected.

That is correct.

The new check caught that we were ignoring this.

Only pass `args` when the contract is _literally "use all args"_, and in that case, _check_ that we do so. Otherwise, pass individual named elements.

chiphogg

If you have any more feedback, I'll address it in follow-on PR(s).

chiphogg · 2022-12-13T02:15:11Z

tools/bin/make-single-file

+class CheckArgs(dict):
+    def __init__(self, args):
+        self.vars = vars(args)
+        self.used = set()
+
+    def are_all_used(self):
+        return not set(self.vars.keys()).difference(self.used)
+
+    def __getattr__(self, k):
+        self.used.add(k)
+        return self.vars[k]


When we deal with #21, this would be a great candidate for a unit test. In fact, I used one to write it:

class TestUseAllArgs(unittest.TestCase): def parse_args_a_b_c(self, argv): parser = argparse.ArgumentParser() parser.add_argument("--a", type=int) parser.add_argument("--b", type=int) parser.add_argument("--c", type=int) return parser.parse_args(argv) def test_detects_unused_arg(self): args = CheckArgs(self.parse_args_a_b_c(argv=["--a", "1", "--c", "2"])) self.assertFalse(args.are_all_used()) self.assertEqual(args.a, 1) self.assertEqual(args.c, 2) self.assertFalse(args.are_all_used()) self.assertIsNone(args.b) self.assertTrue(args.are_all_used())

The reason I didn't land it was because I hadn't yet broken this out into (tested) python libraries, a python binary, and a thin wrapper. But I hope the existence of this test, even if only locally, adds some confidence that this thing does what it's supposed to. (I really do need to go ahead and just do #21...)

chiphogg added 2 commits December 11, 2022 14:47

Suppress irrelevant output

04ef345

Output which showed up in `au-docs-serve`: ``` fatal: not a git repository (or any of the parent directories): .git ``` This doesn't matter because when this happens, we've provided an authoritative option anyway.

chiphogg requested a review from philsc December 11, 2022 20:05

philsc approved these changes Dec 12, 2022

View reviewed changes

chiphogg added 5 commits December 12, 2022 19:40

Merge branch 'main' into chiphogg/manifest

74bb0cd

Fix docstring

d5146d6

Add check that we use all args

dd3b803

Use args.main_files

744deec

The new check caught that we were ignoring this.

Clean up args-passing

e969663

Only pass `args` when the contract is _literally "use all args"_, and in that case, _check_ that we do so. Otherwise, pass individual named elements.

chiphogg commented Dec 13, 2022

View reviewed changes

chiphogg merged commit 2bdf11d into main Dec 13, 2022

chiphogg deleted the chiphogg/manifest branch December 13, 2022 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print manifest comment in single-file packages #36

Print manifest comment in single-file packages #36

chiphogg commented Dec 11, 2022 •

edited

Loading

philsc Dec 12, 2022

chiphogg Dec 13, 2022

philsc Dec 12, 2022

chiphogg Dec 13, 2022

philsc Dec 13, 2022

philsc Dec 13, 2022

chiphogg Dec 13, 2022

philsc Dec 13, 2022

chiphogg left a comment

chiphogg Dec 13, 2022

		@@ -0,0 +1,3 @@
		#!/bin/bash

		echo STABLE_GIT_ID $(git describe --always --dirty)

Print manifest comment in single-file packages #36

Print manifest comment in single-file packages #36

Conversation

chiphogg commented Dec 11, 2022 • edited Loading

Test Plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chiphogg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chiphogg commented Dec 11, 2022 •

edited

Loading