Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some GitHub commit message guidance #110

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion docs/github-merge-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,46 @@ This document describes the merge strategy used for Pull Requests within all rep

Every Pull Request that is made to an InstructLab repository should meet the below requirements - certain repositories such as [taxonomy](https://github.com/instruct-lab/taxonomy) may have additional requirements.

### Commit Messages and PR Description

PRs should be formatted as a series of logical commits that implement the
overall change proposed in the PR. Each commit should represent a logical step
in the progression of the change. The commit message for each should clearly
explain that step of the progression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to be explicit here about squashing fixes and cleanups into those logical commits?

I look for high-level context about "what" a change is to frame my review, as well as some reason "why" the change is needed to understand prioritization and benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I have some older text at the bottom of this doc that could be moved up here as a start. I already moved up some of it. Here's what could be moved:

It is common that a PR author may need to do a final rebase to clean up their proposed commit series before a PR can be merged. It is also fine for a project maintainer to perform this step when the changes necessary are straight forward enough to do so. This includes doing a final rebase on main if necessary. The PR itself should NOT include any merge commits of main back into the developer's branch. We expect the proposed commit series to be a clean set of commits against main without conflicts or merge commit history. We only use a merge commit to record the PR's inclusion into main.


Here are a few examples of PRs that contained a series of commits:

- <https://github.com/instructlab/instructlab-bot/pull/125/commits>
- <https://github.com/instructlab/instructlab/pull/994/commits>
- <https://github.com/instructlab/instructlab/pull/951/commits>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there are even better examples out there. I'm happy to change this over time to represent some of the best examples people can find in our repos. These are a few I had dug up some weeks ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this is a personal preference nit. Noting that many/most PRs are small in scope and (I believe) should usually only have 1 commit. I generally think of a commit as a unit of work that you might choose to un-commit. And breaking down work into revertable logic is a great practice. But stating multiple commits is the general case might lead to a lot of commits for work that doesn't need to be broken up. My ask would be to adjust the wording to not promote multiple commits quite so strongly but still indicate how to use them effectively. If there is a fundamental disagreement on approach, I'm happy to be a chameleon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're certainly not alone in this preference, and I think there's a bit of a difference between a camp of people who developed practices before github existed, and another camp who developed practices around github and maybe are less reliant on e.g. git log

Here's a concrete example of why I'm firmly in the "more is more" (within reason) on splitting up commits: https://github.com/instructlab/sdg/pull/74/commits

Totally trivial PR, but 4 distinctly logical changes, and being able to look at those changes (and their commit message) individually is actually really valuable - e.g. separating the reordering of the parameters from the deleting unused parameters makes it trivial to review. First you make sure the reordering didn't change anything, then you confirm that the 4 parameters being dropped aren't in use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few thoughts, we might consider adding:

*Splitting commits is crucial because it provides "ease of revertability." If a specific commit causes an issue, reverting a single logical unit is much easier and less disruptive than reverting an entire feature.

  • Encouraging multiple PRs for large changes: contributors should be encouraged to split their work into multiple PRs if the total lines changed across commits exceed 1,000 lines. This rule of thumb helps keep changes manageable and reviews more focused.

  • The approach to splitting commits and PRs can depend on the review's velocity. High velocity in review and addressing feedback leads to rapid merges. Small PRs can be reviewed and merged quickly, benefiting both contributors, who see their changes integrated faster, and maintainers, who avoid reviewing massive changes. I wish the various projects could reach that state, we are not there yet.

This strategy helps maintain the project's health and efficiency by ensuring changes are easier to manage, review, and revert if necessary.


Here are some external resources that provide guidance on writing good commit messages:

- <https://www.berrange.com/tags/commit-message/>
- <https://docs.kernel.org/process/submitting-patches.html#separate-your-changes>
- <https://github.com/kubernetes/community/blob/master/contributors/guide/pull-requests.md#smaller-is-better-small-commits-small-pull-requests>

While not a requirement right now, we may want to consider following a spec like
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can also mention 50/72 rule, since Conventional Commits don't seem to cover it. E.g. this: https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

[Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) in the
future. It's worth a look for inspiration, at least.

When your PR contains multiple commits, it is helpful to reuse the work you put
into crafting high-quality commit messages. Here is a script you can use to generate
a summary of the commits in your PR. It starts with a list of the commits one
line at a time and then follows that with the full commit messages. The format works
for pasting directly into GitHub.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub CLI tools are also good for this. hub pull-request starts an editor with a template for creating the PR message, including all of the commit messages for commits in the branch. I'm sure the more modern gh does something similar, and I'd be surprised if there wasn't a VSCode somewhere that did something like it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually use gh pr create myself. When it gets to the PR description part, it launches my editor (vim) and I run :read!~/bin/ghprdesc.sh which is the script from above.

If it's just one commit, it already does what I want by default, which is include the full commit message as the PR title and description.

That reminds me of another point to make -- for a PR that is a single commit, the commit message should be equal to the PR description in most cases. Otherwise, you're probably leaving out info that should be in the commit message, as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the PR description contained a hand crafted description of the overall fix. This way the user pushing the PR has to think about the overall fix or feature being pushed. It provides a good overview to the reader of the PR afterwards.

It should contain breakdown of the different parts if helpful or can be read in the commits.


```sh
#!/bin/bash

git log --reverse --oneline origin/main..HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the script is useful, it highly depends on how the remote is set up. I'd add a note about making sure the remote points to instructlab/instructlab and not your fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Is origin here the main repo (as it is for me) or your fork (which I call fork). :-)

echo
git log --reverse origin/main..HEAD
```

Here is an example PR that uses a PR description that was generated using this
script: <https://github.com/instructlab/sdg/pull/67>.

### CI checks

We should require that all CI checks pass on a Pull Request before it can be considered for merge. Every repository should have at mimimum the following checks:
Expand All @@ -28,7 +68,7 @@ There are [three different merge methods offered by GitHub](https://docs.github.

We use the default merge method of creating merge commits for PRs. This is to ensure we retain the full commit history as intentionally structured by the PR author while also retaining metadata about the PR itself in the merge commit.

This requires project maintainers to include commit messages and the overall structure of the commit series as part of their review. When multiple commits are present, they should represent a logical series of changes that implement the overall change proposed in the PR. The commit message for each should clearly explain that step of the progression.
This requires project maintainers to include commit messages and the overall structure of the commit series as part of their review.

It is common that a PR author may need to do a final rebase to clean up their proposed commit series before a PR can be merged. It is also fine for a project maintainer to perform this step when the changes necessary are straight forward enough to do so. This includes doing a final rebase on `main` if necessary. The PR itself should NOT include any merge commits of `main` back into the developer's branch. We expect the proposed commit series to be a clean set of commits against `main` without conflicts or merge commit history. We only use a merge commit to record the PR's inclusion into `main`.

Expand Down