Faithful to the author’s intent: if code has been written such that it spans multiple lines, that decision is preserved.
Yes, yes! Thank you! I really wish that more formatters got this! I often like to break longer expressions over multiple lines in a way that highlights the structure of the parallel clauses. For example, if I wanted a range check to see if a point is inside a 3D box, I might write something like:
Often this results in a larger number of shorter lines. So many formatters seem to think that since there’s still plenty of space on each line, they should remove all my line breaks and cram everything onto as few lines as possible. Sure, I can maybe add some extra parenthesis, add empty trailing comments, or do other little tricks to try to cajole the formatter into leaving it the way I want. But fighting just to appease the formatter always feels like an aggravating waste of time when it really wants to make the code less readable.
(Note that I have no problems with formatters tidying up indentation, removing trailing spaces, normalizing spaces between tokens, etc. But if the lines aren’t too long, then leave my darn line breaks alone!)
IMO this is the price to pay for consistently formatted code everywhere.
In this specific case you think it would be better to group conditions in groups of 2 (and I tend to agree with you), but it’s very subjective and other devs might prefer it otherwise.
The point of formatters is to remove any subjective decisions, and each degree of freedom it gives goes against this goal. I’d rather give up my freedom on specific examples such as this one than give everyone the freedom to insert line breaks where I would find them very jarring.
I buy that to an extent. But so much of what we do when we write code involves making subjective decisions. Starting with “What do I name this variable?” and going all the way up to “How should I design and architect this system?” Other devs might disagree with my decisions there. And some of the decisions they make might not be to my taste either, but de gustibus… It just seems a little silly to want to completely remove judgement and taste on this point.
I suppose that maybe it’s a matter of domain. I work in graphics and have always enjoyed being close to the metal and having the control that comes from that. (And these days, I’m now designing pieces of the metal.)
To be honest, if there ever existed a tool forcing good variable names if would 100% use it!
But you make very good points, and I whole-heartedly agree.
I guess personal preferences come from experience. Mine has been to use the black Python formatter, which gives very minimal freedom to devs, and even though I strongly disagree with some of the formatting choices it makes (formatting a language where indentation is significant is no easy task), letting go has been liberating!
Sure, that was just an example off the top of my head. And I will sometimes do transformations like that to appease the formatter. But the downsides to something like that are:
It now requires me to spend extra time coming up with good names for the new identifier. I’m pretty picky about names and like to get them just right, so that can be a surprising amount of work sometimes.
Now those identifiers are outside the scope of the if-statement instead of inlined expressions, and now on reading that I’d have to wonder if they might be reused later on. If something is inlined, I know right away that it has minimal scope and can’t leak anywhere.
Adding new identifiers like that makes it more verbose and increases the chance of typos. There’s now a chance that I might accidentally typo if (withinX && withinX && withinZ) or similar. (Granted, some compilers are pretty good now about warning on redundant clauses in boolean expressions.)
It now requires me to spend extra time coming up with good names for the new identifier. I’m pretty picky about names and like to get them just right, so that can be a surprising amount of work sometimes.
I don’t necessarily think that is bad thing! I also am pedantic about the name - after all, they do matter so much to readability. But the alternative (inlined conditions) means no name at all, and that seems worse to me.
Now those identifiers are outside the scope of the if-statement instead of inlined expressions, and now on reading that I’d have to wonder if they might be reused later on. If something is inlined, I know right away that it has minimal scope and can’t leak anywhere.
True enough! Hitting “find references” or the equivalent is pretty easy when I need to know though.
Adding new identifiers like that makes it more verbose and increases the chance of typos. There’s now a chance that I might accidentally typo if (withinX && withinX && withinZ) or similar. (Granted, some compilers are pretty good now about warning on redundant clauses in boolean expressions.)
Agreed, that is a risk. I think in this example, it is more likely to happen due to only a single letter changing (and it took me a couple of reads of your sentence to see the mistake!). In some cases, the editor (lsp/compiler/etc) will tell you that a variable is unused (in Go’s case, will fail to compile), but that doesn’t happen if its also used later in the function. Perhaps automated tests would cover this?
Naming is hard (it’s one of the two hardest things in computer science, along with cache invalidation and off-by-one errors), so why introduce a name when it’s not needed? The first example reads fine for me.
I wonder if a compiler will convert that logic to the initial version internally, to take advantage of short-circuiting to avoid executing the later booleans. Probably not a big difference in this specific case, but I often use similar code to avoid expensive calls.
It seems like it would be something trivial for a compiler to inline, but as to whether they do or not (or rather, which compilers do, and which don’t)…no idea
In this particular case, it looks like Clang generates exactly the same thing both ways, but GCC generates slightly more verbose code in the version with the extra variables.
Faithful to the author’s intent: if code has been written such that it spans multiple lines, that decision is preserved.
That’s a challenging constraint!
On the one hand: for tabular code like you wrote, Boojum, wrapping the lines is about the worst thing a formatter can do. Formatters must leave such code alone.
On the other hand: for non-tabular long expressions, wondering how to improve readability by adding enough-but-not-too-many linebreaks … has been a bit of a productivity trap for me. All the more so whenever I’ve had colleagues whose philosophy was “place linebreaks wherever, I know what I mean when I write it”, thus providing a steady supply of productivity traps. Here, having an autoformatter is a godsend, because it provides an obvious choice even when there is no obviously optimal choice.
On the gripping hand, distinguishing intended linebreaks from thoughtless linebreaks doesn’t have to be the formatter’s job. I currently mark tabular code as ‘don’t touch this’ llke below, and I’m perfectly happy with that.
This is something I’d love to be able to teach a code formatter to recognize and do automatically. Right now I format them manually, and either mark the regions with a code annotations that tell the formatter to ignore it (which feels a little gross) or just avoid using a formatter altogether.
If this is something Topiary could handle easily I would be very interested!
P.S. If anyone has an elisp snippet that can achieve that formatting, I’d love to see it :)
My reaction against formatters comes from exactly this sort of code. Maybe graphics code is just more antithetical to sledgehammer formatting?
The idea that I’ve long had for formatters to detect this kind of thing is to compare character-by-character on each pair of lines and see if there’s an indentation for the second line of the pair that makes more than a certain percentage of the characters on the lines match exactly. If so, just set the indentation of the second line to that and do nothing else. And if that requires less indentation than the lines above it in the block, go back and add that extra bit of indentation so that the left-most statement has the correct indentation.
So in your example, between these lines:
-0.5f, -0.5f, 0.5f,
0.5f, 0.5f, 0.5f,
it would see that by indenting the second line by one space more, it could match each 0 for the 0 above it, each . for the . above it, etc. In this case 22 out of 24 characters (~92%), including whitespace, would match the character above them with that offset.
Yes, yes! Thank you! I really wish that more formatters got this! I often like to break longer expressions over multiple lines in a way that highlights the structure of the parallel clauses. For example, if I wanted a range check to see if a point is inside a 3D box, I might write something like:
Often this results in a larger number of shorter lines. So many formatters seem to think that since there’s still plenty of space on each line, they should remove all my line breaks and cram everything onto as few lines as possible. Sure, I can maybe add some extra parenthesis, add empty trailing comments, or do other little tricks to try to cajole the formatter into leaving it the way I want. But fighting just to appease the formatter always feels like an aggravating waste of time when it really wants to make the code less readable.
(Note that I have no problems with formatters tidying up indentation, removing trailing spaces, normalizing spaces between tokens, etc. But if the lines aren’t too long, then leave my darn line breaks alone!)
IMO this is the price to pay for consistently formatted code everywhere. In this specific case you think it would be better to group conditions in groups of 2 (and I tend to agree with you), but it’s very subjective and other devs might prefer it otherwise.
The point of formatters is to remove any subjective decisions, and each degree of freedom it gives goes against this goal. I’d rather give up my freedom on specific examples such as this one than give everyone the freedom to insert line breaks where I would find them very jarring.
I buy that to an extent. But so much of what we do when we write code involves making subjective decisions. Starting with “What do I name this variable?” and going all the way up to “How should I design and architect this system?” Other devs might disagree with my decisions there. And some of the decisions they make might not be to my taste either, but de gustibus… It just seems a little silly to want to completely remove judgement and taste on this point.
I suppose that maybe it’s a matter of domain. I work in graphics and have always enjoyed being close to the metal and having the control that comes from that. (And these days, I’m now designing pieces of the metal.)
To be honest, if there ever existed a tool forcing good variable names if would 100% use it!
But you make very good points, and I whole-heartedly agree.
I guess personal preferences come from experience. Mine has been to use the black Python formatter, which gives very minimal freedom to devs, and even though I strongly disagree with some of the formatting choices it makes (formatting a language where indentation is significant is no easy task), letting go has been liberating!
I tend to feel that when I am “arguing” with a formatter, it’s an indication that the code should be reformatted.
In your example, if I was reviewing your code, I would be asking you to refactor it regardless of your formatting or the formatter’s formatting:
Which is something a formatter can’t suggest (yet.)
Sure, that was just an example off the top of my head. And I will sometimes do transformations like that to appease the formatter. But the downsides to something like that are:
if (withinX && withinX && withinZ)
or similar. (Granted, some compilers are pretty good now about warning on redundant clauses in boolean expressions.)I don’t necessarily think that is bad thing! I also am pedantic about the name - after all, they do matter so much to readability. But the alternative (inlined conditions) means no name at all, and that seems worse to me.
True enough! Hitting “find references” or the equivalent is pretty easy when I need to know though.
Agreed, that is a risk. I think in this example, it is more likely to happen due to only a single letter changing (and it took me a couple of reads of your sentence to see the mistake!). In some cases, the editor (lsp/compiler/etc) will tell you that a variable is unused (in Go’s case, will fail to compile), but that doesn’t happen if its also used later in the function. Perhaps automated tests would cover this?
Naming is hard (it’s one of the two hardest things in computer science, along with cache invalidation and off-by-one errors), so why introduce a name when it’s not needed? The first example reads fine for me.
I wonder if a compiler will convert that logic to the initial version internally, to take advantage of short-circuiting to avoid executing the later booleans. Probably not a big difference in this specific case, but I often use similar code to avoid expensive calls.
It seems like it would be something trivial for a compiler to inline, but as to whether they do or not (or rather, which compilers do, and which don’t)…no idea
Good question! Let’s Godbolt it!
Here’s x86-64 Clang 15.0 with
-Ofast
Here’s x85-64 GCC 12.2 with
-Ofast
In this particular case, it looks like Clang generates exactly the same thing both ways, but GCC generates slightly more verbose code in the version with the extra variables.
That’s a challenging constraint!
On the one hand: for tabular code like you wrote, Boojum, wrapping the lines is about the worst thing a formatter can do. Formatters must leave such code alone.
On the other hand: for non-tabular long expressions, wondering how to improve readability by adding enough-but-not-too-many linebreaks … has been a bit of a productivity trap for me. All the more so whenever I’ve had colleagues whose philosophy was “place linebreaks wherever, I know what I mean when I write it”, thus providing a steady supply of productivity traps. Here, having an autoformatter is a godsend, because it provides an obvious choice even when there is no obviously optimal choice.
On the gripping hand, distinguishing intended linebreaks from thoughtless linebreaks doesn’t have to be the formatter’s job. I currently mark tabular code as ‘don’t touch this’ llke below, and I’m perfectly happy with that.
I’m a big fan of code formatters, but I haven’t been able to get them to pull off custom alignment for things like groups of vertices, e.g.
This is something I’d love to be able to teach a code formatter to recognize and do automatically. Right now I format them manually, and either mark the regions with a code annotations that tell the formatter to ignore it (which feels a little gross) or just avoid using a formatter altogether.
If this is something Topiary could handle easily I would be very interested!
P.S. If anyone has an elisp snippet that can achieve that formatting, I’d love to see it :)
My reaction against formatters comes from exactly this sort of code. Maybe graphics code is just more antithetical to sledgehammer formatting?
The idea that I’ve long had for formatters to detect this kind of thing is to compare character-by-character on each pair of lines and see if there’s an indentation for the second line of the pair that makes more than a certain percentage of the characters on the lines match exactly. If so, just set the indentation of the second line to that and do nothing else. And if that requires less indentation than the lines above it in the block, go back and add that extra bit of indentation so that the left-most statement has the correct indentation.
So in your example, between these lines:
it would see that by indenting the second line by one space more, it could match each
0
for the0
above it, each.
for the.
above it, etc. In this case 22 out of 24 characters (~92%), including whitespace, would match the character above them with that offset.Sadly, I don’t have any elisp for this.