I don’t know of a good C++ code formatter, but clang-format is not one. It is a thing I don’t like about C++.
1. Myths
First, I must dispel 4 oft-cited myths used to argue for uncritically slapping on a random coding standard.
Myth 1: Any coding standard is as good as another.
There is such a thing as practicality in coding standards. Before even contemplating controversial topics – aesthetics, it’s not hard to think of aspects of code formatting that contribute to write amplification – how big a change becomes in the resulting diff – that should be uncontroversial. Let’s get the basics right:
Ease of adding or removing an element at the end. This thing:
XXX XXX = XXX::XXXXXXXXXXXXXX { X,
Y }
Reflow ripple effects: This thing, after inserting “something”:
XXX XXX = XXX::XXXXXXXXXXXXXX { "something",
X, Y }
Dependent indentation: This thing, after you renamed it (and forgot to reformat):
XXX XXX = XXX::XXXX { X,
Y }
Indentation dependence: When changing the scope of a section of code causes the formatter to change strategy:
{
XXX
XXX =
XXX::
XXXXX
{ X, Y }
}
Human predictability: Does the formatter follow rules that a human with infinite experience can predict, or does it play chess with sums of weighted costs? In practical terms, must you run a formatter locally in order to write anything that CI will accept?
There exists a formatting that has none of these problems. I’m of course talking about the self-evident pythonic/rustic formatting (which probably has many more names):
XXX XXX = XXX::XXXXXXXXXXXXXX {
X,
Y,
}
Myth 2: The most important thing is to have a coding standard and enforce it.
I remember a time before clang-format: I would say that professional developers did at least as good of a job as clang-format to begin with. In fact, in some ways better than any autoformatter could ever come up with, because the human knows best, such as which arguments are associated. Freedom of expression! This openness to creativity made the conventions fluid, so that better ideas had a foothold.
In contrast, with clang-format, I see good developers being passive and indifferent to details like trailing comma that are not at all insignificant to what clang-format will do.
If the purpose of automatic formatting is to avoid style disputes in code review, it doesn’t work, because too few people know the importance it gives to trailing comma – I have to nag people about it.
Myth 3: It is possible to configure clang-format to a pythonic/rustic style.
I have tried every config option. There is AlignAfterOpenBracket
, but you have to do the rest yourself in terms of remembering trailing comma (which is only applicable in curly braces and not enforceable), forcing line breaks with line comments and liberal use of // clang-format off
.
Myth 4: It is always convenient for everyone to run the formatter.
If you haven’t noticed the trend, everything is wrapped in impenetrable all-encompassing dockerized CI-scripts that can’t just check a small change quickly.</sarcasm>
It doesn’t actually matter how convenient it is, because I don’t necessarily approve of what it does to my code – I can’t run the formatter before I have committed my changes anyway. Then, I rewrite my code to comply if needed be. If revising one’s commit stack isn’t hard enough as it is, doing it with style changes into the mix is the worst.
2. Properties of a good formatter
3. So what’s wrong with clang-format in particular?
All the above. If clang-format behaved like a python formatter or like rustfmt, you wouldn’t be reading this. Though I’m no fan of automatic formatting in general, other languages have it better.
Yes, and that style usually contributes to files being overly long and sparse. It is not the panacea and many developers (including myself) don’t prefer that style when having more things on the screen is more valuable.
Hahahahaha. No. I also remember the time where other developers checked things in with their customized tab-width and used spaces for alignment on top of tabs for indentation. I do not wish to go back to that time.
I’m happy that you haven’t met some people I’ve worked with.
What.
Instead of being glad that it’s the only thing you have to fight over, you choose to complain because clang-format doesn’t solve everything and anything.
I’ll give you point 3 in that it would be nice to give the option to have non-dependent syntax in clang-format for those who want it.
Sounds like a you issue. Our pre-commit linters at work finish in a couple seconds.
I find your rant to be opinionated and not considerate of the other side of the aisle. By all means, feel free to fork clang-format to adjust it to your needs, but to complain without considering the environment which spawned clang-format (many different C++ projects with wildly different standards had to be appeased for adoption) is ignorant.
I absolutely agree with the OP about clang-format being less than ideal; that being said their take is insanely self-centered, and practically nothing in their rant is actually about clang-format.
Some real issues I have with clang-format:
That being said, we’re on a brand new project at work now, and I’m very excited to have time to get clang-format working again. Our code reviews have gone to shit without it. Everyone wants to feel like they are contributing to reviews by pointing out /something/ if they can, but we’re missing real issues because the [legitimate] style issues are easy to spot, then those people feel accomplished, and move on. Not only are we frequently missing the forest from the trees, we’re also wasting time iterating reviews on style, not to mention the time put in to styling before the reviews even go up.
I agree on point 1. Point 2 can largely be solved with CI tooling. With GitHub Actions, you can allow actions to push to the PR branch and so it’s fairly easy to set up a CI job that just applies clang-format and adds an extra commit if it generates any changes.
Point 3 is surprising though. I mostly use clang-format via ALE in Vim and it’s more or less instant, even on some of the 10 KLoC files in the LLVM repo on a decent machine. On a slow machine (low-power Pentium Silver core) it takes a couple of seconds on a small project, but that machine is a lot slower than my 10-year-old laptop.
Point 3 could be specific to Windows or VS2019 which is why I called them out so much :) We’re all running brand new Threadrippers, and it’s just weird that it’s often instant, and other times it takes 3-5+ seconds (for one file).
Any idea how would you solve Point 2 in conjunction with ALE in Vim? We’re primarily using Perforce, so we don’t quite have the modern luxuries of Git hooks and GitHub Actions. So what’s always been most important to me is if everyone’s always setup to format-on-save (with the same version of clang-format!!) ala what you have with ALE in Vim, then we hardly even need validation on the PR side. But… even just including a .clang-format without dealing with Point 2 could cause people with different versions of clang-format to start tripping over Point 1 (Which it’s important to note, I really only care about Point 2 explicitly due to Point 1). It wouldn’t be fun to say “Can you install this older version of clang-format that we use on this project to use with ALE so you just have the formatting right in the first place?” It’s nice in other languages/ecosystems where you can clone, and just get to writing code with the same tooling as everyone else on the project is using.
I think my two issues with solving Point 2 as you noted are: A: Unnecessary formatting commits, (unless you squash the PR on merge.) B: Could be confusing to people if they have to fetch/pull after pushing up a PR (especially if they started commiting more code locally and end up with a merge conflict). These are just my opinion though, and it’s definitely better than nothing!
I suspect this is the process start time on Windows. I wonder if
clangd
can be used instead there?The only way of solving this is to move responsibility of solving this away from individual developers and to a central location. Locally, you can format the code however you want, but on push it gets formatted correctly.
In my ideal world, the solution to this would be to store an AST in your revision control system and have formatting the code be something that happened only locally (and in things like GitHub’s web UI). That way, anyone could have their own favourite presentation style for the code and not worry about how it looks on anyone else’s screen. Things like the colours for syntax highlighting are not stored in the repo, why is the number of non-significant whitespace characters?
Hah very fair, I’m completely onboard with that, and I will too slowly work towards that ideal world.
Thanks for sharing!
This seems like a misplaced rant against coworkers who don’t format things with trailing commas like the author would like, and weird tooling issues (I certainly don’t run clang-format via Docker and can’t imagine why I would), rather than much of a criticism of clang-format itself. Even the “self-evident” example is a format that clang-format can and does use! Eliminating the comma “trick” would just mean you have even less control over formatting, which contradicts the point.
The main reason people (myself included) like machine-dictated formatting is to reduce/eliminate time wasted on bikeshedding. If the main issue is that developers won’t place a comma to your liking, not using an automatic formatter certainly isn’t going to make anything better: that just means there are many, many more things that other developers can do that isn’t to your liking.
This sort of thing easily gets toxic even in corporate environments where developers are expected to deal with their coworker’s nitpicks (because they get paid at least in part to do so), and is practically a non-starter in an open source context where you simply are not going to get submissions in exactly the style you want. You can then either manually reformat it all yourself, or berate people for it, certainly just driving potential contributors away. That is the main point of automated formatting, not “it always makes the code as pretty as possible”. clang-format is popular because it’s more or less Good Enough for everyone to tolerate, not because it’s anyone’s ideal, and having a tool to do that job eliminates a ton of wasted time which can be spent on something actually important.
I think you’re trying to misunderstand. I’m trying to address an industry-wide problem, and I don’t like bikeshedding.
The trailing comma is just a detail that happens to work in curly braces (which I’m glad for), but doesn’t apply anywhere else, like function arguments, so it’s not like the “self-evident” format is supported everywhere.
To me, it sounds like you are saying: “I have strong opinions on the right way C++ code should be formatted, and I am unable to force
clang-format
to use it, so therefore, don’t use it.”I have the same issue with
rustfmt
. It does what people imaginegofmt
doing, but not what it actually does.gofmt
is an error-fixing formatter. It fixes things about formatting it knows are wrong (like braces, indentation), but leaves everything else more or less unchanged (like 1-line vs multi-line choice). There are many ways to format the same construct, and gofmt respects humans’ high-level choice about the “layout” of the code.Most other formatters are destructive canonicalizers, which completely replace formatting of the input with their own heuristics. The difference between these approaches is very significant, because reformatting based on heuristics doesn’t leave room for common sense, and bulldozes over formatting exceptions.
Unfortunately, these unforgiving blunt tools are still used due to the fallacy of “we must do something; this is something; therefore we must do this”.
I thought
gofmt
was a destructive canonicalizer. So you are telling me thatgofmt
will not change:into
The former isn’t an error, just a difference in opinion.
gofmt
has an opinion on braces when they’re on multiple lines, but it will preserve 1-linerfoo {bar, baz}
version as-is. This is unlikerustfmt
that will splat 1-liners as high as it wants, or re-wrap multi-line constructs into long spaghetti, depending on which heuristic they hit.So unless spacing causes a compilation error, then I wouldn’t call
gofmt
an error-fixing formatter.I mean an error in a general sense from perspective of formatting, like using 3 spaces to indent is an erroneous formatting.
No. If 3 spaces for indenting is an “error” then it should be enforced at the language level. Rob Pike wimped out in this regard.
I think you’re just arguing about the meaning of the word “error”, rather than what I’ve said about formatters?
I’ve used the word “error” in its non-technical English meaning of “deviation from what is correct”, and not the other meaning of “what Rob believes must stop compilation”. If that ambiguity bothers you, please read my original post with “error” replaced with “imprudent deviation”:
If the language allows white space between tokens, why is X spaces correct, while X-n or X+n spaces incorrect? X is an arbitrary value; it’s just an opinion being enforced. If you won’t want people to have opinions on the “correct formatting” for a language, make it impossible to have said opinions at the compiler level, not with some external tool.
It will change the former to the latter. And you’re right, that doesn’t represent a fixing of an error. But the whole point of the tool is to remove style choices like this from the set of things that programmers can have different opinions on! :) As the proverb goes,
gofmt
‘s style is nobody’s favorite, butgofmt
is everybody’s favorite.Amen.
I mean that
rustfmt
has sane heuristics. But yes, it is unfortunately a destructive bulldozer too. I was unaware thatgofmt
was not. Good tip!I feel the same way. I want to like formatters but at the end of the day I find that human logic is really the best. Formatters tend to focus on unimportant things like line-length. What I care about is line complexity, I want to keep each line having one, maybe two ideas. No formatter can understand that.
I don’t mind some basic formatting such as brace style and fixing indentation but after much more than that I find it is putting things onto one line when it shouldn’t or vice-versa. I think things like clang-format have tried to take on too much and really make all code look the same. But I’ve never been confused by code because it used
a+b
rather thana + b
.I used to have a lot of opinions about code formatting, but over the years they’ve been hollowed out to just one: All of the code in a particular language within the scope of a project or organization should be formatted consistently and automatically with the same tool, preferably at the editor level but definitely enforced at check-in. I think Black and gofmt beat any other opinions I might have out of me.
I do think that most of
clang-format
‘s built-in styles are ugly in various ways, and once wasted a bunch of time trying to make a.clang-format
that made it look more like Black. In retrospect, I wish I hadn’t, because the output still isn’t 100% to my particular taste. I have also possibly raised the barrier to entry to contributing to the project I did that on because now the code doesn’t look like any other extant project.Formatters like Black and gofmt offer me zero decisions to make, so I just get down to coding and learn to love the formatter.
clang-format
offers me several choices, so I spend a bunch of time making (possibly incorrect) decisions about things that are ultimately inconsequential - “Let’s compare all of the built-in style to see which one I like best! Oh I kind of like C, but I really like the way A formats foo better - let’s see if we can tweak it!” – 4 hours later, much work has been done, but nothing of value has been accomplished.Side thought: C is probably a tougher language to come up with a “one true format” for because of it’s age and diversity of implementations - there are standards bodies, but there really isn’t a single cohesive “C community.” I think every book I read and professor I encountered in university formatted things differently.
clang-format
can never be authoritative in the way thatgofmt
is.Mainframe languages might hold the high card here, with their requirements for various types of statements to start exactly in a particular column. (The high card in this case is an 80x12 punchcard)
I love code formatters, but this problem is my pet peeve. When I choose a coding standard for a new project, my top priority is choosing a flavor that avoids write amplification. A standardized format is great, but not if it interferes with code review. I’ve encountered too many bugs hiding inside a diff hunk that was 90% auto-formatting noise. (I’ve had no luck with diffs that hide white space changes.)
I agree that a conscientious developer can format code better than a machine, at the margin. However, I’ve not found the marginal improvement to be worth the marginal cost in developer time/attention. A machine can format code 90% as good as a human in 1% of the time. Something like reformatter for emacs automatically reformats the buffer when I save it, so I spend almost no time on layout at all. I enjoy the way it lets me stay in flow.
It’s also the case that not all programmers are human. Maybe it’s a fancy refactoring tool or maybe it’s a lazy Perl script to munge away a recently discovered anti-pattern. Either way, they’re terrible at formatting. Having a human reformat this code manually gets expensive. Of course, non-conscientious developers who format at random also border on non-human :-)
Have you tried difftastic? Seems like it’s designed to address exactly this problem.
I guess the argument is that “the coding standard” in the code formatting sense shouldn’t be something that’s different from project to project, it should be (more or less) a universal property of the language.
Then the “formatting” should be enforced at the compiler level. Why leave it up to another tool?
I agree!
I think the real mistake here is that it sounds like you’re reviewing diffs? That’s always the wrong move – you want to review the resulting file, not the minimal diff going into it.
All a diff can tell you is that someone inserted a new method into a file. If you just review the diff you might walk away thinking it’s a well-written change. If you review the resulting file after applying the diff, you might catch the fact that the added method is duplicative of other methods, that this has become a real issue in the file, and recommend the submitter abstract out the common logic and leave things better than they found them rather than trying to do the bare minimum and making the underlying mess worse.
Etc. The diff just does not contain enough information to do a proper code review. Worrying about whitespace in the diff is getting hung up on the wrong details.
A typical LLVM PR changes a hundred lines of code in 3-4 files, each of which is thousands of lines long. Telling people that they should review the entire file when reviewing a PR rather than the changes is the same as telling them that they should not bother doing code review: you’re advocating for something that is completely infeasible.
a) Most things aren’t LLVM (indeed, nothing in the person I was replying to’s bio or GitHub seems to indicate that they’re an LLVM dev – and believe it or not, I do not frame everything I write always in terms of you personally and the unique challenges you face, stranger). “This doesn’t work in the most extreme case, so it’s not a good idea in any case” is just letting the perfect be the enemy of the good.
b) In most projects, the solution to “I can’t review 4 ten thousand line-long files” is “don’t let your files become 10s of thousands of lines long in the first place”. That’s really a gigantic, phenomenally industrial strength C++ codebase problem, which isn’t most codebases. It shouldn’t be surprising that LLVM is a pretty extreme outlier with fairly atypical challenges! We’ve got linters that scream bloody murder when files hit 500 lines, and something in the 200-1000 line range is typical. “That this doesn’t work in the extremes means this is always a bad idea, to me” is, again, just letting the perfect be the enemy of the good.
c) There is such a thing as using your head. Don’t review just the diff, because it does not have enough context; also don’t exhaustively review 14,000 lines that aren’t changing in the 30 line diff. Skim it. Get the gist of the file or at least the things that are near it and which it touches. Try your best to find a reasonable balance. Code reviews are only as valuable as the effort you put into them.
Semantic diffing would obviate a lot of those objections. If we have fancy tools that understand language syntax to remove whitespace around, why shouldn’t the tools that show changes also understand syntax and weed out whitespace noise? I recall seeing links here to at least two such tools lately.
That said, minimizing diffs is not one of my top criteria for a coding style. Ease of reading is probably the first, primarily alignment. Then there’s not wasting too much vertical space, and a sensible line length (100 IMHO) because when an editor soft-wraps lines it really messes up readability.
True. Except
git blame
would still tell the same old story, andgit rebase
would still give you the same write amplification induced conflicts. I don’t know if that’s fixable.Yeah, it would be nice to have smarter diffs integrated into VCS too.
It sounds like Git has really advanced the state of the art in file-level delta and merge algorithms (to say nothing of other systems like Darcs), but it’s still based on treating lines as opaque atomic units … just like the original Unix diff tool from, what, 1969. At the other extreme we have binary delta tools like xdelta that view files as bags of bytes. There’s a lot of room for improvement in between…
git blame
already supports-w
to ignore whitespace changes. I think technically nothing stops it from using semantic diff algorithm or formatting on the fly as a preprocessing step when computing diffs and blame.If I understand, this is about the formatting style choice of having indentation on continuation lines align with similar elements on the preceding line. This is a common style and something I always avoid. A small change will often result in an entire block needing a change in indentation because the length of whatever was on the first line has changed. It also feels inconsistent because you get variable amounts of indentation (why indent a wrapped while condition more than a wrapped if). And often it pushes things far over to the right. And column alignment is only sometimes appropriate.
Where it is useful to be able to discern continuation lines from normal indentation, just use double the normal indentation amount. My vim setup applies this and I’m not especially aware of having ever configured it that way.
Another problem no one’s mentioned yet is performance. At work we use formatters on large generated files, and we’re having a lot of trouble with clang-format (https://fxbug.dev/78303). On my MacBook Pro, it can format 1 MB/s but the max RSS also scales linearly at 150x file size. So formatting a 1 MB file takes 150 MB of memory. The clang++ parser deals with these files without that memory blowup. My teammate @ianloic is trying to optimize some of clang-format’s data structures.
(Why format generated code? Because people can jump-to-definition and read it. Why generate huge files? We’re also working on splitting or shrinking, but these sizes aren’t unusual compared to e.g. protobuf, thrift.)
Rustfmt isn’t good on generated code either. The performance is in the same ballpark as what you quoted for clang-format: 2.8 MB/s. For formatting generated code I made https://github.com/dtolnay/prettyplease based on a simpler algorithm, which does 60 MB/s and fixes other shortcomings of rustfmt that tend to occur in generated code.
The same approach may be adaptable to C++, but I admit I’m not sure how it would accommodate preprocessor macros. Rust’s macros are much easier to format in comparison because the syntactic positions that they can be invoked in are strictly limited.
I hope to have some simple patches that can land easily and some others that might take a little more convincing. The peak memory reduction is only about 20% though, IIRC.
Copied from r/programing:
A few months ago I identified two categories of code format tools: at one end of the spectrum we have rule enforcers, and at the other end we have canon enforcers.
With rule enforcers, we have a set of rules the code must adhere to, and anything that breaks those rules is incorrect, but within the confines of those rules, you can do whatever you please. For instance, assuming lines are limited to 25 characters, the following would be incorrect:
On the other hand, there may be several correct ways to fix it:
With canon enforcers, there’s one way to format code. Anything different is basically incorrect, and ends up being formatted back to the One True Style.
Now some of you may say that limiting lines to 25 columns is a tad restrictive. How about raising that limit to 80? With that, the first version I showed above becomes correct. It’s a win!
Well, it depends. I can feel like 80 columns is a bit large, and really, most of the time a limit of 25 is okay. We have to have a hard limit, but it’s nice to have a soft (unenforced) lower limit as well. Besides, what if variables are related in some logical way? In such a case, the most readable code might look like this:
Enter the canon enforcer. With those, if I chose to set the limit at 80 column, they will prevent unneeded line breaks. But this completely breaks the spirit of a soft lower limit!! And I can kiss semantic groupings goodbye.
In a real project I’m working on, the architects wrote in the code style that we ought to observe an 80 column soft limit. But the formatting tool they gave us is configured company wide to 120 columns. And because clang-format is a canon enforcer, it means I cannot break lines that would take between 80 and 120 characters.
A similar problem occurs for function calls. Either the whole call fits in less than 120 columns, and it has to be a single line, or it does not, and the only accepted style is one argument per line. So instead of:
I was forced into this less readable:
Pretty infuriating.
Now if you don’t care about style, canon enforcers are great: they prevent your carelessness from polluting the code base too much. I however do care. and when the tool forces me into something that is clearly less readable than what I’m trying to achieve, within the confines of the official code guidelines, I die a little inside.
Having some rules is good. But don’t overdo it.
Good for you, clearly other people disagree with you, so it may be worth asking whether your practical experience is different from theirs. One of the most frustrating parts of many projects I’ve been involved with over the laws was ensuring consistent formatting. clang-format was the first, and in my experience only, effective for all c++. The best anyone was other wise able to do is style error check scripts, which were both necessary and also objectively annoying.
Most of what you list here I would consider a bad style, because style has significant subjective elements. Other aspects of the coding style are significantly impacted by the nature of the apis in the project.
Furthermore, clang-format supports a number of standard (from the largest clang using projects) style guidelines, but also lets you come up with you own arbitrary clunky rules.
Regardless it does not appear people say any is equivalently good, but that people have different subjective opinions as to what is “good”. You may disagree with me, but that simple fact that I and others in the replies disagree with you should hammer home that you are incorrect in your belief that there is a single “objectively” superior format.
Hahahhaha, no, no large scale project operated like that. They had manual, and eventually heavily scripted, style checks on all commits.
For large projects with significant numbers of contributors, having a consistent style is incredibly useful, as in the real world large project will have different people working in different areas, and moving around between different areas. Having wildly diverging style from one area to the next makes it much more challenging that it needs to be for a person to hop from one area to another.
It is possible that you have simply not worked in large scale software development and engineering projects so have experienced the value.
This is nonsense. The freedom of expression comes from the code you write, not your formatting. Using random formatting rules serves only to make it difficult for other developers to read and maintain your code.
No the purpose of automatic formatting is so that you don’t have to go through manually correcting all the formatting errors yourself.
wtf are you talking about?
No comment as I have no idea what you’re talking about
It is a hell of a lot more convenient that multiple manual passes to find and correct style issues. For projects I work with where clang-format isn’t an option the patch review system automatically runs style checkers to tell you where your errors are, and you have to fix them, or you manually run the script before posting to correct it then.
Not sure what you think your saving by doing it manually, I remember the annoyance of patch cycles when there were format errors.
clang-format has a very sensible set of default configurations: the formatting rules of a collection of the largest C++ projects in development: LLVM, GNU, Google, Chromium, Microsoft, Mozilla, WebKit
Those are reasonable defaults. If you want something different clang-format allows you to control near as I can tell every aspect, you can even derive from one of the above.
I do not understand wtf you are talking about with this “freedom of expression” bullshit. The reason projects have style rules is specifically to halt the problems caused by randomized code styling. clang-format does a far better job than any of the preceding tools, most of which were at best scripts that used regexes to find and report errors.
It sounds like you simply have not worked in actual large scale software development based on how much value you think you are adding by using you own randomized coding style, where every experienced developer here will almost certainly tell you that pretty much any consistent style is superior to inconsistency. Even LLVM’s wretched style guidelines are better than variable.
Believe me, I have tried. If someone could point me at the equivalent setting for what
rustfmt
calls indent_style, they would be my hero.This is an odd strawman to start with. I’ve very rarely heard people argue this. More generally, the ordering from best to worst is:
Without clang-format, you generally end up in category 5. Some projects have style rules that they manually enforce, but they’re generally ambiguous (the amount of my life I’ve wasted with people bikeshedding about different interpretations of FreeBSD’s
style(9)
is huge). LLVM has an objectively bad style (there are a number of rules in it that make it easy to introduce specific bug categories) but it’s now consistently applied by clang-format and, now that I’m familiar with it, I know the things I need to pay more attention to in code reviews. This is better than no style and is better than a good style that I’m not familiar with and need to train my eyes to follow.