I tend to lean towards the “git-core” end of the spectrum, particularly for long-lived projects. Finding the place in which a bug occurs is one thing, but it really helps to understand the context in which the bug-producing code was written. Doing a blame/bisect and being able to get inside the author’s head at that moment (which might be younger me’s head!) might prevent me from introducing other bugs!
I also find good commit messages handy with reverts, particularly when we’re trying something new and it doesn’t work. At work we tried to fix an issue around database encodings with some SQLAlchemy casts a few years ago; a week after they were introduced, another developer reverted that change with no explanation in Git. That developer has moved on to another company, and when the encoding issue finally reared its ugly head again, we had to rediscover the reason for the revert in the first place!
This fucks bisect, defeating one of the biggest reasons version control provides value.
Furthermore, there are tools to easily take both approaches simultaneously. Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff. And, for example, Phabricator even pulls down the revision (pull request equivalent) description, list of reviewers, tasks, etc, and uses that to create a squash commit of your current branch when you run arc land.
I’m surprised to hear so many people mention bisect. I’ve tried on a number of occasions to use git bisect and svn bisect before that, and I don’t think it actually helped me even once. Usually I run into the following problems:
I love the idea of git bisect but in practice it’s never been worth it for me.
Your second bullet point suggests to me bisect isn’t useful to you in part because you’re not taking good enough care of your history and have broken points in it.
I bisect things several times a month, and it routinely saves me hours when I do. By not keeping history clean as others have talked about, you ensure bisect is useless even for those developers who do find it useful. :(
Right: meaningful commit messages are important but a passing build for each commit is essential. A VCS has pretty limited value without that practice.
It does help that your commits be at clean points but isn’t really necessary - you don’t need to run your entire test suite. I usually will either bisect with a single spec or isolate the issue to a script that I can run against bisect. And as mentioned in other places you can just bisect manually.
You can run bisect in an entirely manual mode where git checks out the revision for you to tinker with and before marking the commit as good or bad.
There are places where it’s not so great, and there are places where it’s a life-saving tool. I work (okay, peripherally… mostly I watch people work) on the Perl 5 core. Language runtime, right? And compatibility is taken pretty seriously. We try not to break anyone’s running code unless we have a compelling reason for it and preferably they’ve been given two years' warning. Even if that code was written in 1994. And broken stuff is supposed to stay on branches, not go into master (which is actually named “blead”, but that’s another story. I think we might have been the ones who convinced github to allow a different default branch because having it fail to find “master” was kind of embarrassing).
So we have a pretty ideal situation, and it’s not surprising that there’s a good amount of tooling built up around it. If you see that some third-party module has started failing its test suite with the latest release, there’s a script that will build perl, install a given module and all of its dependencies, run all of their tests along the way, find a stable release where all of that did work, then bisect between there and HEAD to determine exactly what merge made it started failing. If you have a snippet of code and you want to see where it changed behavior, use bisect.pl -e. If you have a testcase that causes weird memory corruption, use bisect.pl --valgrind and it will tell you the first commit where perl, run with your sample code, causes valgrind to complain bitterly. I won’t say it works every time, but… maybe ¾ of the time? Enough to be very worth it.
No it doesn’t. Bisect doesn’t care what the commit message is. It does care that your commit works, but I don’t think the article is actually advocating checking in broken code (despite the title) - rather it’s advocating committing without regard to commit messages.
Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff.
This, on the other hand, fucks bisect.
Do you know how bisect works? You are binary searching through your commit history, usually to find the exact commit that introduced a bug. The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!
Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another. Then when you bisect, you can actually find a test failure or other issue. Without squashing you’ll end up with a compilation failure or something from some jack off’s WIP commit. At least if you follow the author’s advice, that commit will say “fuck” or something equally useless, and whoever is bisecting can know to fire you and hire someone who knows what version control does.
Do you know how bisect works?
Does condescension help you feel better about yourself?
The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!
I don’t read it that way. The article mainly advocates not worrying about commit messages, and also being willing to commit “experiments” that don’t pan out, particularly in the context of frontend design changes. That’s not the same as “not working” in the sense of e.g. not compiling.
It’s important that most commits be “working enough” that they won’t interfere with tracking down an orthogonal issue (which is what bisect is mostly for). In a compiled language that probably means they need to compile to a certain extent (perhaps with some workflow adjustments e.g. building with -fdefer-type-errors in your bisect script), but it doesn’t mean every test has to pass (you’ll presumably have a specific test in your bisect script, there’s no value in running all the tests every time).
Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another.
Sure, but it also makes those changes much bigger. If your bisect ends up pointing to a 100-line diff then that’s not very helpful because you’ve still got to manually hunt through those changes to find the one that made the actual difference - at that point you’re not getting much benefit from having version control at all.
Github now has a “squash and merge” option on pull requests, so you can have it both ways: a clean master branch history and a messy PR commit history. Only problem is that your PR branch commits no longer cleanly map back onto master so git branch -d always complains, etc.
I’ve never understood when I would want to use squash and merge because it completely botches the commit message by just stitching all of the commit messages from the PR into one. That’s typically not what I want, especially if the PR commits are messily organized. It then becomes the responsibility of the person merging to write the commit message, which may not be the same person that submitted the PR. And you have to write the commit message in the Github web UI.
I do use the “rebase and merge” functionality though.
It then becomes the responsibility of the person merging to write the commit message, which may not be the same person that submitted the PR.
Yeah, I think this is only a good idea for teams where the author of the PR is also the one merging it, and they just use the PR for review and comments.
I’ve been very happy since this feature came out: I used to waste time crafting a commit timeline that communicated intent… now I just commit as I go, make sure no PR is too big, and then squash and merge (finally crafting a decent commit message at the end).
one might take a look at … https://www.mercurial-scm.org/wiki/ChangesetEvolution
Sometimes I’ll even deliberately push a commit and immediately revert it because I want a record of one possible experiment was.
Isn’t that part of the point of a branch?
The funny thing is that if Github allowed typical FTP then I suspect Zach Holman might just use that instead. I fail to see the point of a VCS if the commit history is completely mired in useless commit titles.
My first question for Zach would be if his problem isn’t with the Git process as an idea, but just the fact that committing and branching takes time? I suspect he enjoys reverting commits just because it’s a faster process (especially if you use a GUI app like Github’s Mac client).
Why would someone use VCS to store deliberately reverted commits? Because it’s more convenient than a zip file of patches? History is not the only feature provided by VCS.
A zip file of patches still hangs on the idea of a history. “finalFinal2.zip” and such like that. I’m talking about pushing and live-updating a dev server through FTP. It’s the closest thing to madness, but for the use-cases that Zach has; it might fit much better for him.
There are a lot of wonderful things about VCS and Git, but it seems Zach uses all of these as minimally as possible.
but just the fact that committing and branching takes time
I have a script that creates a new randomly-named branch if I’m on master, checks out that brach, stages all my changes, commits my changes, and runs some fast tests.
As I work, I might think of a good name for the feature branch and rename it. When I’m done, I’ll squash commits (as needed) and merge.
Without the little script, this would be 5 commands per iteration plus the overhead of thinking of a good branch name. That’s painful.
my answer is usually “I don’t care, I’m going to grab another slice of pizza, you do you”
And here we see a flippancy towards craft that all of us should probably give wide berth. :(
It’s easy to not give a shit about things until your entire team is doing it all differently. Jesus fuck.
This is grounds for immediate firing on the spot if he worked on my team. I can hardly think of worse offences that one can do.
Luckily for both of us, he’d never get hired in the first place.
And people ask why I don’t simply publish more code. Maybe I’m trying to avoid never getting hired.
Ruling bad employers out of hiring me is a pretty good reason to publish a lot of code.
There’s a certain irony in your comment as Zach Holman was fired by Github (albeit not for his commit history).
Agree with you 100% though. Heck, I even write proper commit messages for simple scripts I write - what’s the point of using version control if my commit message just contains profanity or emoji? A properly-written commit message is just as important as the diff itself.
Well, he turned it into a ranty conf talk :).
The talk in question
I found this way more enlightening than the bun fight over what to put in commit messages vs what to put in pull request logs.
I mean, while I’m working on it, my feature branch commit log will probably be pretty messy. I try to commit early and commit often (if nothing else, I’ll commit what I’m working on and push it up at the end of the day whether it compiles or not, just so the only copy of my day’s work isn’t sitting on my hard drive), and all those tiny commits definitely don’t always have meaningful commit messages. But when it comes time to merge that in, I’ll rebase it down to just one or two comprehensible, functional commits, so the history of any released commit is clean and amenable to bisection.
At work, we use mercurial, where this workflow is somewhere between difficult and impossible, which is my single biggest complaint against mercurial. The history I produce is definitely not as nice as I would like because I can’t clean it up after the fact, and I don’t commit as often as I would like, either.
Have you tried mercurial evolve? It will let you easily rewrite your local changes without worrying about possibly rewriting public changes. I’ve been happily using it for more than a year now.
Not in great detail. Almost all of those tools I’ve looked at won’t allow you to edit history after it’s been pushed, which eliminates most of the benefits of rewriting history at all. From glancing at docs for evolve and phases, it looks like it might be possible to set something up where feature branches are always draft and not marked public until merged into a release or default branch, but that’s nontrivial and not automatic, and would require getting buy-in from the rest of my team.
There’s also hg histedit and hg fold that work really well for this use case (rewriting history).
where do you work?
Github can consider the PR the fundamental unit of change if it wants to. There’s zero chance of Github ever switching away from Github.
OTOH, if you ever switch to another Git host, you’ll want your commit messages themselves to explain the changes.
If the PR description were added as the commit message for the merge commit, I might feel differently.
Git repositoriea are portable and ensuring you write a decent commit message encompassing the “why” of the changes means your repo will always have a solid history and a solid foundation. It’s so lazy to bash the keyboard instead of writing a sentence about why you’re making a change. Also it takes longer to find some clever Unicode characters (unless you’re using the MBP Touchbar I guess…thanks Obama).
An argument that occurred a lot at a place I used to work for why we should entirely trash our git-commit logs was that they wanted to preserve the work history of how they got to some implementation, and the various failed attempts.
This, however, ruins git-bisect and basically all other forms of “use history to find information about my codebase”.
Use the commit message for this information. Use your design document for this information.. fuck use the pull request, even though I find this a horrible horrible idea, it’s better than trash commits! You can have your cake an eat it too!
If you need to communicate with your future self or other developers, don’t ruin their VCS tool, communicate!
This lazyness excused as practicality is just lazyness.