1. 31
  1.  

  2. 12

    Counter-argument: Automatic code formatting tools and rules make fixes easier to backport, not harder. Without consensus on code formatting you’ll end up with trivial syntax differences between branches, which breaks git cherry-pick.

    The trick is to apply any code formatting rule to every supported release. If you’re backporting a patch from Linux 5.13 to 4.4 then you shouldn’t have any issues with syntax because 4.4 should already have the same code style as 5.13. Now the diff between the two branches represents the semantic differences, none of the syntactic litter you’ve picked up between release. (Most code formatters are non-deterministic, which isn’t ideal, but they’re a huge improvement over the Wild West of inconsistent syntax.)

    Do automated code formatted cause problems with git cherry-pick, or are large ‘rewrite every file’-type changes incompatible with email-based patch workflows? /semi-s

    1. 1

      I like this

    2. 8

      I once worked for a company where they managed to create about half a million subversion commits in just 2 or 3 years, with about 3 developers working on it. I’ll leave it as an exercise to guess how they managed to do that :-)

      1. 4

        By my calculation (500,000 / 3 years / 3 devs / 225 days worked per year) that’s about 250 commits per day, per developer. That’s one every 2 minutes from each developer. Am I doing the math right? That’s gotta be auto-commit-on-save or auto-commit-every-2-mins or similar? Do tell…

        1. 9

          Those numbers sound about right; and no, it wasn’t automatic, editor saves did create commits as @ptman mentioned, the reason being some rather unconventional use of SVN: commits were how you got your changes to your development environment. Any change you wanted to show up you had to commit.

          This comment turned in to an entire story, which I posted on my website; the most relevant bits:

          They had a server in the office which ran the SVN server and OpenVZ to give every developer their own container running Apache and PHP, and that’s what you would use for development. How do you get your code to that container? NFS? SMB? FTP? Nah, that’s so boring! SVN is a much better tool for this!

          The way this worked is that on every push the SVN server would run this PHP script to copy the changes to the right container based on the committer, the idea being that everyone only got their their own changes and not other people’s. You didn’t work off a martin branch – branches are for losers – you would always commit to the main trunk branch, which was the only branch people used. The script would look at the committer and copy all the files that commit touched to that person’s container. Every once in a while you manually updated your directory to get other people’s changes. Two people working on the same file at the same time was … unwise.

          That PHP script was indecipherable with umpteemt levels of nesting. No one dared to touch it because it “mostly worked”, some of the time anyway. If it was the third Tuesday of the month.

          Every little change you wanted to see you had to commit. Add a debug print? Commit. Improve that print? Commit. Found the bug and fixed it? Commit. Remove that print again? Commit. Fix up the comment? Commit. People had their editors set up to commit and push to SVN on save. You could easily rack up hundreds of commits on a single day.

          I don’t recall exactly how many people worked on this and for how long, I think it was about 3-4 developers over a timespan of 2-3 years before I joined, maybe even shorter. It was a pretty small company. I do distinctly remember reaching the half a million mark.

          It really was a subversion of subversion.

          I worked like this for a few days before I told them to give me access to the server so I could set up SMB because this was just unworkable for me. Aside from mucking up your SVN log, you need to run a command every time which is just annoying (I would stick that in a Vim autocmd now, but I didn’t know about those back then, and since the machine they gave me was Windows I didn’t really know how to do file watching either). The reason it took me that long was because this was my first real programming job, and was a bit too insecure to ask sooner 😅 It also made me doubt myself: “Am I not understanding SVN correct? Is this normal? Do all companies work like this?” Turns out I did understand it correctly, that it’s not, and that no one does.

          I migrated the entire shebang to Vagrant and mercurial about a year later. I didn’t bother retaining the subversion history. rm -rf .svn; hg init; hg ci -m 'import svn code'; hg push and I called it a day.

          1. 2

            Subversion supported webdav. Maybe they were using autosaving editors and saves created commits. So were there any commit messages at all?

          2. 1

            Never cleaning their history and auto-comitting changes every 10m?

          3. 6

            Or we could store and version these things using a structured storage format instead of plain text. Begging the question that it might not be so; Just because we think plain text is the best UI for the physical act of programming, doesn’t make it a good API for tooling, or data format to be managed by smart storage systems.

            1. 5

              I’ve seen the problem of code formatting commits messing up git histories play out across multiple codebases and languages. It makes Unison’s ideas of an append-only repos with separate definition files containing their pure syntax trees really compelling.

              1. 2

                It’s an interesting idea, but won’t really cover stuff like renaming Id to ID, or changing if foo to if !foo “for clarity”, and all sorts of other kind of pointless changes I’ve seen people make.

                1. 5

                  Unison’s definition files contain references to hashes of the syntax tree instead of names. This keeps the names of things separate from what they are. So the former case is covered, but the latter (where two syntax trees are logically identical but structurally different) is not.

                  1. 2

                    changing if foo to if !foo

                    Aren’t you inverting the logic on that one?

                    all sorts of other kind of pointless changes

                    Not sure if this is what you meant, but fwiw I don’t consider making changes to enforce consistent style pointless.

                    1. 3

                      It’s just a little example of reorganizing logic (i.e. inverting the branches or some such). I didn’t feel like typing out an entire code block so I hoped that would be clear, but I guess it wasn’t 😅

                      I don’t consider making changes to enforce consistent style pointless.

                      I don’t really agree; I mean, if it’s one change on some code you’re already touching: sure, why not, if it’s not too invasive. But I’ve seen people go through entire code bases for weeks with changes like this, resulting in hundreds of changes that improve very little, or can even regress things and introduce bugs.

                      I’m probably one of the more consistent programmers, or at least am among the people I’ve worked with over the years. I don’t understand why people can’t just be consistent; it’s not that hard people.

                      But at the same time, it’s really not all that important. There are about 100 things more important.

                      1. 5

                        Yeah, I’ll make the opposing case, but the specifics of how it’s done really matter. The costs of inconsistency became clear to me over the last few years working at my current company, where we have a code base shared among multiple teams.

                        First, re:

                        There are about 100 things more important. In the narrow, immediate sense, you are obviously right. The thing is that there are non-immediate social costs. People will argue about this stuff. There will be comments in PRs. The same discussions will be had over and over in different ways, among different people, and with every new hire.

                        why people can’t just be consistent; it’s not that hard people.

                        Some people just are and care, and some people don’t. Value judgements aside, that’s just how it is, empirically. And even among the consistent people, they have different preferences. This is the genius (and motivation) of gofmt: End all discussion, formatting decisions are automated.

                        It’s really, imo, the only solution for any language: agree on a style guide, and automate enforcement as much as possible in your CI pipeline. And then bring your current codebase up to date, with the kind of commits we’re talking about here. The long term benefits are really big, and they really do matter when social costs are factored in.

                        1. 5

                          gofmt still leaves plenty of room for discussion; it says nothing about naming, line length (mostly), shadowed variables, and a host of other things. It only resolves a few minor issues that no decent programmer really cares about IMHO (it really doesn’t matter where the braces are).

                          That was my point with my first comment: you can automate fairly simple things like brace placement and whitespace, but there are far more things you can’t really automate. All of what I described earlier was actually at a Go shop. There was an extensive discussion about whether there should be a blank line between if err != nil and the statement that assigned the error for example. And then we had the “panicIf()` guy.

                          I do love gofmt btw, but mostly so I can just write if foo==""{bar()}, save, and it just formats correct. It’s a good productivity booster.

                          The long term benefits are really big, and they really do matter when social costs are factored in.

                          I can’t say I really see the benefits, and certainly wouldn’t call them “really big”. Unless the formatting is a complete mess and/or crazy (e.g., indentation that’s just wildly off, 4 statements on a single line, etc.) which is a different matter, I can’t say I think it really matters all that much to be honest. I worked with plenty of inconsistent codebases over the years, and never really struggled with that aspect specifically (even when I was much more junior).

                          It’s mostly just making things look nice, which is nice, but in my experience the benefits are slim to non-existent. Certainly not “really big”.

                          I think there probably is a correlation between “consistent and well formatted code” and “good code in general” (i.e. good architecture, tests, sane logic, documented, etc.) and I think people are confusing these things. You can’t really measure or quantify any of these much more important things, but you can get a 100% score on a linter tool. A good example might be this that I happened to mention on HN yesterday:

                          Once they showed me the “best” code and it was a mess. Okay, that’s not a show-stopper as there could be perfectly valid reasons for that. Then they showed me the “worst” code and it was actually a bit better, but the guy said “yeah, it’s not good; we’re missing public/private on a lot of classes”.

                          1. 3

                            gofmt still leaves plenty of room for discussion; it says nothing about naming, line length (mostly), shadowed variables, and a host of other things.

                            Agreed, it would be nice if it did even more.

                            It’s mostly just making things look nice, which is nice, but in my experience the benefits are slim to non-existent. Certainly not “really big”. Again, the benefits are not that as much in the code itself as in avoiding the social bike shedding and all the associated time wasters. And I’d happily take what imo was sub-optimal choices just to have all such discussions done away with. “We’re doing stuff this way, now let’s move on”

                            I think there probably is a correlation between “consistent and well formatted code” and “good code in general”

                            I’m glad you brought this up. I meant to and then forgot. I think this a key point, because when I hear the argument you’re making (“there’s more important things than this”), it’s often (practically speaking) an excuse for being sloppy. I’m not accusing you of that, and you’ve already said you are yourself consistent/neat, but the argument can be a smokescreen.

                            Also, the argument seems to posit some mythical programmer whose priorities are so honed that they don’t waste time on consistent naming or “pretty” formatting, yet nevertheless spend all their energy on important decisions, like (say) good high-level architecture, excellent performance, and correctness. Their code might be messy, but in every way that matters it’s exemplary. I mean, I just don’t buy it. People don’t work like that. In rare cases, maybe? But I’ve literally never come across it. On the other hand, sloppiness in little things is ime a strong signal that code will have problems that do matter.

                            Ok, you might say, but getting people to care about formatting isn’t going to magically fix those important things. And that’s true. But working in a culture where quality at every level is valued is a signal people will pick up on, and respond to. It sets up incentives for good, careful work, whereas “just letting it go” sets up the reverse incentives.

                            1. 1

                              gofmt still leaves plenty of room for discussion; it says nothing about naming, line length (mostly), shadowed variables, and a host of other things.

                              Agreed, it would be nice if it did even more.

                              I’m not sure what more it could reasonably do without also introducing a lot of constraints that would be quite limiting?

                              Also, the argument seems to posit some mythical programmer whose priorities are so honed that they don’t waste time on consistent naming or “pretty” formatting, yet nevertheless spend all their energy on important decisions

                              Not at all; it’s just that obsessing over these things is a bit silly. And priorities do matter; resources are always finite, and it can make a huge difference how you spend those finite resources.

                              On the other hand, sloppiness in little things is ime a strong signal that code will have problems that do matter.

                              Yeah exactly: these things are correlated but there is no direct causal relationship. And even this correlation is far from absolute as I’ve also seen messy codebases that were actually pretty good overall.

                              But some people act like there is a causal connection, and this is not the case.

                              1. 1

                                It’s unclear to me if we disagree when the rubber meets the road. Practically, my position is:

                                1. You should have a style guide and follow it. Enforce everything you can in your CI. You can’t answer everything, but you document decisions about what people spend time talking or arguing about, so that doesn’t become a perpetual time-waster.
                                2. When new things come up, don’t shrug them off, but come to a consensus and document your decision. Everyone knows “the way we do things here.” You won’t be able to do this perfectly, but you can do it pretty well.

                                If your position is that none of those things are worth the bother, then yeah, we just disagree on that point. Otherwise, what is your alternative recommendation? If it’s “c’mon, enough talk, everybody just be reasonable” – then my experience says that does not work.

                                1. 1

                                  You should have a style guide and follow it. Enforce everything you can in your CI. You can’t answer everything, but you document decisions about what people spend time talking or arguing about, so that doesn’t become a perpetual time-waster.

                                  As long as you just stick to the basic stuff; I guess? If you style guide becomes more than a page: meh. I don’t think it’s super important. There are a few key issues (tabs vs. spaces, camelCase vs. snake_case) that are useful of important, but this is a very short list and beyond that 🤷

                                  And there are serious downsides too; before you know it you end up with silly stuff like:

                                  print("Some sentence the linter thinks is too long by one"
                                        "character")
                                  

                                  And/or # nolint comments all over the place. That’s absolutely not better.

                                  Never mind the hours of lost time just to make some linter in the CI happy. How does that compare to the lost time from slightly inconsistent formatting? Probably not in favour of the CI-lint approach.

                                  Does it improve the code base? I guess, a little. But at what cost? It’s a good example of the politician’s fallacy: “this is horrible, we must do something. This is something. Therefore, we must do this.” Sometimes the solution is worse than the problem.

                                  Certainly in a company setting the entire thing is a bit of a quick fix to reduce the power of a small number toxic assholes who will insistently try to hammer through stupid pointless style changes “because it’s not my favourite way!” This can be dealt with much easier by telling these people to stop being such assholes, or firing them if they don’t (because almost invariably these people will be difficult to work with in general).

                                  When new things come up, don’t shrug them off, but come to a consensus and document your decision. Everyone knows “the way we do things here.” You won’t be able to do this perfectly, but you can do it pretty well.

                                  95 out of 100 times you can just see “how things are done here” by looking at the existing code, certainly when it comes to style issues, and a wee bit of inconsistency really isn’t that big of a deal.

                                  If it’s “c’mon, enough talk, everybody just be reasonable” – then my experience says that does not work.

                                  If someone is trying to force through some minor inconsequential thing then this is very much a valid response IMHO. And if it doesn’t work then it says a lot about those people.

                2. 2

                  We need more qualified advice like this :)