1. 39
    1. 14

      Notionally, a large feature may be broken down into many small features, which depend on each other variously; if you think small or focused commits are a good idea, you might like to do this, and topologically sort the small features, making each a commit. But it is more important that a commit should be self-contained. Hence, you cannot break down a large feature all the way—in the limit, you obviously can’t break it down into a sequence of single-character changes—so the limit is a strongly-connected component.

      These strongly-connected components may be large and sprawling, however. This may be a sign that you have chosen the wrong abstractions; but often I find myself changing the abstractions for that very reason. This can have far-reaching effects.

      Ultimately, I don’t find it very useful to have a ‘commit methodology’ (notwithstanding my previous comment that a commit should be self-contained). I write new code, and try to make the existing code better, and use commits and branches to maintain a log of that.

      1. 5

        Very well put. The scale of a commit is proportional not only to the size of the feature being added but to the age and size of the existing code.

    2. 9

      Every perfect commit should include a link to an issue thread that accompanies that change.

      But the link will eventually point to a 404, because the org decided to change ticketing platform, for good or bad reasons, it will happen, and what was documented there will be lost (even if tickets are copied to new platform (which is not always the case), links will still be dead). A link can be provided for later commenting, by other people, yes, but the commit message should be self sufficient as it’s the only info source that will last forever.

      1. 6

        More often it’ll point to a 403 when a new team member joins and can’t get permission to the project board that nobody remembers as being important.

        I joined a 10 year old project at a megacorp that was in maintenance mode and after 8 months left, still having never found the permissions I needed for even the architecture documentation.

      2. 4

        My last day at Mozilla was in mid-2015. They infamously use Bugzilla for everything. Since then, every company I’ve worked for has used Jira. There was a brief attempt by Airtable to break in, but Jira just absolutely completely owns the market for issue tracking at tech companies, as far as I can tell.

        So while there is value in having technical information in the commit message, the larger context can and should be offloaded to the issue tracker via a link to the Jira ticket ID. And the “don’t document it there because it will be thrown away in a migration” argument works just as well as an argument against documenting in commit messages – who says the company’s going to keep the commit history if and when they switch code hosts or version-control systems? So the only logical conclusion is not to document anything, anywhere, in any system, ever, since they all are equally susceptible to being thrown away.

        1. 2

          who says the company’s going to keep the commit history if and when they switch code hosts or version-control systems?

          Code history is certainly the thing best kept, better than ticket history. It’s also the simplest to keep.

          1. 1

            It’s a good idea to keep all the documentation. But your premise was that companies won’t do that. My premise is 1) companies tend to just buy Jira and stick to it, and 2) that once you buy into your premise, there’s no reason to believe any particular forum of documentation is more likely to be kept compared to others, and so it makes no sense to push for any one form of documentation above others.

      3. 2

        Enough people have reported that they have seen orgs with no respect at all for maintaining their institutional memory that I’m going to research ways to address this.

        I’m optimistic that the “git notes” mechanism can help here - either by copying issue threads into annotations in the repo itself or at least by adding links to archived issue tracker content.

        1. 1

          I tried some experiments with git notes in this repo: https://github.com/simonw/playing-with-git-notes

          My conclusion at this point is that a better path would be to mirror the issues into a separate branch in the repo itself: https://github.com/simonw/playing-with-git-notes/issues/2

    3. 5

      In terms of committing tests with the feature/fix, I’m keen on the idea that we write a “failing test” before the fix (to avoid the failure mode where you write the test for your fix afterwards, see it pass and say “that’s all ok”).

      What do people think about reflecting this in the commit history? Doing so makes a visible statement that you did this, which is good (and helps promote the practice on the team) but it also breaks CI until you push the fix. This might be a feature? (The codebase does actually have a bug, it is just that it wasn’t surfaced until now). But this might also be disruptive.

      1. 4

        A perfect deployable master branch at every commit is over rated. It is perfectly acceptable to gate decisions based on tests or even human awareness.

        1. 3

          having a perfectly deployable master branch when working with other people means that at any moment I can just start work on a new feature by taking master.

          Perhaps you use some other marking to determine this, but then it’s the same principle, just with a different mark.

          (Of course in practice that means that sometimes things get merged into the main branch that are busted, and this doesn’t hold. But if it’s exceptional it’s wasting way less time than if it’s the norm)

        2. 2

          This, there’s so many cases where you just don’t have a nice way of solving a problem, and sacrificing the actual solution at the altar of an arbitrary process is not worth it imo.

        3. 2

          In addition, you rarely find solutions to current problems in git history. Sometimes, yes, for consultation, but rarely will you actually be deploying that ancient commit.

          1. 4

            Most often you want to know why the line of code was added, what problem it was fixing. Or you are doing a git bisect, and a failing build might interfere with that, but most often you really just need sufficient context for the commit in the commit.

            Anything else, perfect tests, perfect messages, perfect docs, is overkill, though can be nice.

      2. 3

        When I’ve done this workflow (commit the failing test first) I do it in a branch, then publish a PR to GitHub (where I can watch CI fail on the PR page), then squash commit that to main later along with the implementation.

        I just saw an interesting idea on Twitter: pytest offers an “xfail” mechanism which can mark a test as “expected to fail” such that it can fail while still leaving CI green. So you can use that to check in a known-to-fail test without breaking things like “git bisect” later on: https://twitter.com/maartenbreddels/status/1586609659464630273

      3. 3

        You can, even should, write the test first, but it should go in the same commit as the fix, at least for published commits.

    4. 2

      Not to get into the weeds, but magit is a lifesaver for this kind of flow. Things like, in the commit submenu, being able to pick “auto fixup” (that commits your changes, then applies them to an older commit via a rebase). Easily ammending and browsing changes, etc.

      I’m convinced a lot of people dislike doing this kind of VCS work because, despite the fact that people “know” what they want to do, it’s either actually hard in the details or it’s hard to know what set of operations gets you to where you want easily. But stuff like magit (where you are selecting commits/branches instead of looking them up offband tand then typing them into commands) helps a lot.

    5. 1

      The Problem: Solution: model for commit messages helps, in my mind, add context to code. In larger orgs, you can gain access to the source, but not the exact issue tracker board used at the time of code authorship, and putting the problem being solved in there seems to reduce the importance of the issue thread.