Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print manifest comment in single-file packages #36

Merged
merged 7 commits into from
Dec 13, 2022
Merged

Conversation

chiphogg
Copy link
Contributor

@chiphogg chiphogg commented Dec 11, 2022

Once au goes public, there are going to be a lot of files out in the
wild called au.hh (and a bunch more called au_noio.hh). These files
are all going to look very similar when people skim them, but they'll
have myriad tiny variations. It's critically important to make these as
easy as possible to distinguish!

To nip this in the bud, we'll add a comment at the top of the file
which provides a "manifest", clearly identifying both the version of
code and the options with which the file was generated. We'll add a new
--version-id argument to the script, which gives a human-readable,
release-tag-based description.

Right this moment, it gives: 0.2.0-1-g545a831-dirty. This means we
are based on release 0.2.0, with 1 extra commit, and the current
unambiguous ID is a git (g) commit with abbreviated hash 545a831.
Since the worktree is dirty (after all, I'm in the process of
committing), it appends the word dirty. If we were exactly on a
release, the commit hash would be omitted. Since we don't want to
burden end users with generating this information, we will provide it if
they omit it.

Git version information is typically inaccessible to bazel. This makes
it hard to include the manifest in our automatically generated
single-file releases. To fix this, we use bazel's workspace status
machinery [1], writing the ID to the stable-status.txt file. We then
give the genrules access to this file.

To do so, we need to set the --workspace_status_command for every
build command in our .bazelrc. Typically, one would rather not do
this for every command. However, empirically, the burden was too
small to be perceptible.

Fixes #30.

Test Plan

  • Run au-docs-serve, and view the single-file releases.
  • Run make-single-file --units meters seconds | gview -.

[1] https://bazel.build/docs/user-manual#workspace-status

Once `au` goes public, there are going to be a _lot_ of files out in the
wild called `au.hh` (and a bunch more called `au_noio.hh`).  These files
are all going to look very similar when people skim them, but they'll
have myriad tiny variations.  It's critically important to make these as
easy as possible to distinguish!

To nip this in the bud, we'll add a comment at the top of the file
which provides a "manifest", clearly identifying both the version of
code and the options with which the file was generated.  We'll add a new
`--version-id` argument to the script, which gives a human-readable,
release-tag-based description.  Right this moment, it gives:
`0.2.0-1-g545a831-dirty`

This means we are based on release `0.2.0`, with `1` extra commit, and
the current unambiguous ID is a git (`g`) commit with abbreviated hash
`545a831`.  Since the worktree is dirty (after all, I'm in the _process_
of committing), it appends the word `dirty`.  If we were exactly on a
release, the commit hash would be omitted.  Since we don't want to
burden end users with generating this information, we will provide it if
they omit it.

Git version information is typically inaccessible to bazel.  This makes
it hard to include the manifest in our automatically generated
single-file releases.  To fix this, we use bazel's workspace status
machinery [1], writing the ID to the `stable-status.txt` file.  We then
give the genrules access to this file.

To do so, we need to set the `--workspace_status_command` for every
build command in our `.bazelrc`.  Typically, one would rather not do
this for _every_ command.  However, empirically, the burden was too
small to be perceptible.

Test Plan
---------

- [x] Run `au-docs-serve`, and view the single-file releases.
- [x] Run `make-single-file --units meters seconds | gview -`.

[1] https://bazel.build/docs/user-manual#workspace-status
Output which showed up in `au-docs-serve`:

```
fatal: not a git repository (or any of the parent directories): .git
```

This doesn't matter because when this happens, we've provided an
authoritative option anyway.
@chiphogg chiphogg requested a review from philsc December 11, 2022 20:05
tools/bin/make-single-file Outdated Show resolved Hide resolved
@@ -21,7 +22,7 @@ def main(argv=None):
"""
args = parse_command_line_args(argv)
files = parse_files(filenames=filenames(args=args))
print_unified_file(files)
print_unified_file(files, args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tempting as it may be, I've always regretted passing args in my own code. Passing this opaque structure through makes the code harder to understand. The only real way to understand what the function is doing with args is to read its implementation. It also makes unit testing really annoying because you have to construct such an instance instead of just passing in the actual values you care about.

I highly recommend not doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly feel the same way... generally. However, in this particular case, the point of manifest() is literally "make sure we use all the args", so I think args is best.

That said, I'm concerned that we could miss handling some args. So I wrote a small wrapper that makes it easy to check that we have handled all of them: dd3b803. This revealed that we were not in fact handling the main_files arg (which is now rarely used), so I added a section for it: 744deec. Finally, and much to my embarrassment, I noticed that I was lazily passing args without due cause, so I fixed that instance, while making sure to use named parameter passing in the instances where I do want that: e969663.

@@ -0,0 +1,3 @@
#!/bin/bash

echo STABLE_GIT_ID $(git describe --always --dirty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me that making this information as "stable" is beneficial. Perhaps in a small repo like this it doesn't matter. But when you get a larger repo, it's generally less useful to have this be stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal in doing this was to force the stamped targets (i.e., //:au_hh and //:au_noio_hh) to rebuild every time the git description changes. I had thought that if I let the git description be "volatile", then these would not get rebuilt. Am I understanding correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your goal is exactly "rebuild any time anything at all changes", then what you have does work. It is, however, not super useful. E.g. every time you create a commit, everything gets "relinked/restamped". In a repo of non-trivial size, that time spent restamping/relinking adds up pretty quickly. In my experience, the more useful notion is "any time this target needs to get rebuilt, grab the latest git information". That means targets only get rebuilt when they actually need to be. You can then still manually force everything to get restamped with --embed_label.

Even if you make this volatile, the information will never be wrong, but it might not mean exactly what you want it to. Basically, if you make it volatile, then the stamped git hash means "if you check out this hash, then you will be able to reproduce the corresponding artifact exactly." It does not mean "this artifact was built at exactly this hash".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, not important for a project of this size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I would like to fix this up to get it right, but it's not a top priority since this doesn't block release, and we're very far from having a repo big enough to notice this (if we even ever do).

At lower priority, therefore, I had a clarification question. When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?

My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True targets to be directly affected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "rebuild any time anything at all changes", were you referring to stamped targets (and their reverse deps), unstamped targets, or both?

stamped targets (and their reverse deps)

My expectation/mental model was that the stamped targets (and anything that depends on them) would get rebuilt when the stable version changes, but anything else would not. In other words, I expected only the two stamp=True targets to be directly affected.

That is correct.

The new check caught that we were ignoring this.
Only pass `args` when the contract is _literally "use all args"_, and in
that case, _check_ that we do so.  Otherwise, pass individual named
elements.
Copy link
Contributor Author

@chiphogg chiphogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have any more feedback, I'll address it in follow-on PR(s).

Comment on lines +209 to +219
class CheckArgs(dict):
def __init__(self, args):
self.vars = vars(args)
self.used = set()

def are_all_used(self):
return not set(self.vars.keys()).difference(self.used)

def __getattr__(self, k):
self.used.add(k)
return self.vars[k]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we deal with #21, this would be a great candidate for a unit test. In fact, I used one to write it:

class TestUseAllArgs(unittest.TestCase):
    def parse_args_a_b_c(self, argv):
        parser = argparse.ArgumentParser()
        parser.add_argument("--a", type=int)
        parser.add_argument("--b", type=int)
        parser.add_argument("--c", type=int)
        return parser.parse_args(argv)

    def test_detects_unused_arg(self):
        args = CheckArgs(self.parse_args_a_b_c(argv=["--a", "1", "--c", "2"]))

        self.assertFalse(args.are_all_used())

        self.assertEqual(args.a, 1)
        self.assertEqual(args.c, 2)
        self.assertFalse(args.are_all_used())

        self.assertIsNone(args.b)
        self.assertTrue(args.are_all_used())

The reason I didn't land it was because I hadn't yet broken this out into (tested) python libraries, a python binary, and a thin wrapper. But I hope the existence of this test, even if only locally, adds some confidence that this thing does what it's supposed to. (I really do need to go ahead and just do #21...)

@chiphogg chiphogg merged commit 2bdf11d into main Dec 13, 2022
@chiphogg chiphogg deleted the chiphogg/manifest branch December 13, 2022 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include metadata comments in generated single files
2 participants