Threads for anordal

  1. 23

    Tabs for indentation, spaces for alignment.

    1. 7

      Exactly. Variable-width characters at the start of a line are great. Variable-width characters in the middle of a line are annoying because they won’t line up with other fixed-width things. Recent versions of clang-format now support this style, so there’s no reason to use spaces anymore.

      1. 4

        I have to suffer through clang-format at work, I can tell you it’s pretty bad. The worst aspect so far is that it does not let me chose where to put line breaks. It’s not enough to stay below the limit, we have to avoid unnecessary line breaks (where “unnecessary” is defined by clang-format).

        Now to enforce line lengths, clang format has to assume a tab width. At my workplace it assumes 4 columns.

        Our coding style (Linux) explicitly assumes 8.

        1. 2

          You can tell clang-format how wide it should assume for tabs. If people choose to use a wider tabstop value, then it’s up to them to ensure that their editor window is wider. That remains their personal choice.

          1. 1

            I’ve found out that clang-format respects line comments:

            void f( //
                void *aPtr, size_t aLen, //
                void *bPtr, size_t bLen //
            );
            
        2. 7

          I think when people say this they imagine tabs will only occur at the start of the line. But what about code samples in comments? This is common for giving examples of how to use an function or for doc-tests. It’s much harder to maintain tab discipline there because your formatter would have to parse Markdown (or whatever markup you use) to know whether to use tabs in the comment. And depending on the number of leading comment characters, the indentation can look strange due to rounding to the next tabstop. Same thing goes for commented out sections of code.

          1. 3

            Go uses tabs for indentation and spaces for alignment. It works pretty well in practice. I can’t say that I’ve ever noticed anything misaligned because of it.

            1. 4

              If you wrote some example code in a // comment, would you indent with spaces or tabs? If tabs, would you write //<space><tab> since the rest of the comment has a single space, or just //<tab>? gofmt can’t fix it for you, so in a large Go codebase I expect you’ll end up with a mix of both styles. With spaces for indentation it’s a lot easier to be consistent: the tab character must not appear in source files at all.

              1. 1

                I can’t say that I’ve ever written code in a comment, because I just write a ton of ExampleFunctions, which Go automatically adds to the documentation and tests. Those are just normal functions in a test file. I think what’s interesting about Go is that they don’t add all the features but the ones they do add tend to reinforce each other, like go fmt, go doc, and go test.

              2. 3

                Personally, I think it would have annoyed me if go fmt didn’t exist. Aligning code with spaces is annoying, and remembering to switch between them even more so.

                1. 1

                  Yes, it’s only practical if a machine does it automatically.

            2. 1

              I said this elsewhere in the thread, but it’s worth reiterating here: I’d bee with you 100% if it weren’t for Lisp, which simply can’t be idiomatically indented with tabs (short of elastic tabs) because it doesn’t align indentation with any regular tab-stops.

            1. 6

              troff authors usually started each sentence on a new line — a practice that made it easier to wrangle text on ancient paper terminals with ed

              Interesting perspective. I think this practice is underappreciated in prose: I find that when I write prose line by line, like source code, or like a poem, it’s both easier to read and wrangle as I wrangle my thoughts.

              It also produces less conflicting diffs for collaboration purposes. You would think this would make sense on Wikipedia, but too often, someone that doesn’t get it comes along and “cleans” it up.

              1. 5

                https://sembr.org/ is the link I usually send to people who ask me “is this essay supposed to be a poem? Why is it broken up like that?”

                1. 2

                  Thanks, TIL.

                  After having seen Aaron’s light HTML, I was actually surprised when I saw HTML missing on the list of «light markup languages that support semantic line breaks» – it does support semantic line breaks, so it’s not that… Then, I saw the “light”: HTML can be a light markup language, you just need to see it first.

                  1. 1

                    TIL, this is great. I’ve advocated for one-line-per-sentence for years but hadn’t a great name for it. Semantic Line Breaks is awesome.

                  2. 2

                    It also produces less conflicting diffs for collaboration purposes

                    This is the main reason that I write like this. The other is that it’s much easier to move sentences around if it’s just cut line, paste line, rather than cut-range-in-line, paste-at-point-in-line.

                    1. 2

                      In one of my talks, about a Pandoc-centric document workflow, I implore writers to use one-line-per-sentence because it makes review in GitHub, GitLab, Gerrit, etc. much easier.

                    1. 1

                      I don’t know of a good C++ code formatter, but clang-format is not one. It is a thing I don’t like about C++.

                      Good for you, clearly other people disagree with you, so it may be worth asking whether your practical experience is different from theirs. One of the most frustrating parts of many projects I’ve been involved with over the laws was ensuring consistent formatting. clang-format was the first, and in my experience only, effective for all c++. The best anyone was other wise able to do is style error check scripts, which were both necessary and also objectively annoying.

                      Myth 1: Any coding standard is as good as another.

                      There is such a thing as practicality in coding standards. Before even contemplating controversial topics – aesthetics, it’s not hard to think of aspects of code formatting that contribute to write amplification – how big a change becomes in the resulting diff – that should be uncontroversial. Let’s get the basics right:

                      Most of what you list here I would consider a bad style, because style has significant subjective elements. Other aspects of the coding style are significantly impacted by the nature of the apis in the project.

                      Furthermore, clang-format supports a number of standard (from the largest clang using projects) style guidelines, but also lets you come up with you own arbitrary clunky rules.

                      Regardless it does not appear people say any is equivalently good, but that people have different subjective opinions as to what is “good”. You may disagree with me, but that simple fact that I and others in the replies disagree with you should hammer home that you are incorrect in your belief that there is a single “objectively” superior format.

                      Myth 2: The most important thing is to have a coding standard and enforce it. I remember a time before clang-format: I would say that professional developers did at least as good of a job as clang-format to begin with. In fact, in some ways better than any autoformatter could ever come up with, because the human knows best, such as which arguments are associated.

                      Hahahhaha, no, no large scale project operated like that. They had manual, and eventually heavily scripted, style checks on all commits.

                      For large projects with significant numbers of contributors, having a consistent style is incredibly useful, as in the real world large project will have different people working in different areas, and moving around between different areas. Having wildly diverging style from one area to the next makes it much more challenging that it needs to be for a person to hop from one area to another.

                      It is possible that you have simply not worked in large scale software development and engineering projects so have experienced the value.

                      Freedom of expression! This openness to creativity made the conventions fluid, so that better ideas had a foothold.

                      This is nonsense. The freedom of expression comes from the code you write, not your formatting. Using random formatting rules serves only to make it difficult for other developers to read and maintain your code.

                      If the purpose of automatic formatting is to avoid style disputes in code review

                      No the purpose of automatic formatting is so that you don’t have to go through manually correcting all the formatting errors yourself.

                      it doesn’t work It objectively does. When I can use clang-format I don’t have to go through manually looking for formatting errors

                      because too few people know the importance it gives to trailing comma – I have to nag people about it.

                      wtf are you talking about?

                      Myth 3: It is possible to configure clang-format to a pythonic/rustic style.

                      No comment as I have no idea what you’re talking about

                      Myth 4: It is always convenient for everyone to run the formatter.

                      It doesn’t actually matter how convenient it is, because I don’t necessarily approve of what it does to my code – I can’t run the formatter before I have committed my changes anyway. Then, I rewrite my code to comply if needed be. If revising one’s commit stack isn’t hard enough as it is, doing it with style changes into the mix is the worst.

                      It is a hell of a lot more convenient that multiple manual passes to find and correct style issues. For projects I work with where clang-format isn’t an option the patch review system automatically runs style checkers to tell you where your errors are, and you have to fix them, or you manually run the script before posting to correct it then.

                      Not sure what you think your saving by doing it manually, I remember the annoyance of patch cycles when there were format errors.

                      1. Properties of a good formatter

                      Sensible by default, or a sensible configuration (after the criteria above) must exist in its configuration space.

                      clang-format has a very sensible set of default configurations: the formatting rules of a collection of the largest C++ projects in development: LLVM, GNU, Google, Chromium, Microsoft, Mozilla, WebKit

                      Those are reasonable defaults. If you want something different clang-format allows you to control near as I can tell every aspect, you can even derive from one of the above.

                      The formatter’s job is not to take this freedom of expression away. Its job is to satisfy a disjoint set of requirements. > Therefore, it must allow more than one way to lay out the same code.

                      I do not understand wtf you are talking about with this “freedom of expression” bullshit. The reason projects have style rules is specifically to halt the problems caused by randomized code styling. clang-format does a far better job than any of the preceding tools, most of which were at best scripts that used regexes to find and report errors.

                      Humanly predictable. 3. So what’s wrong with clang-format in particular? All the above. If clang-format behaved like a python formatter or like rustfmt, you wouldn’t be reading this. Though I’m no fan of automatic formatting in general, other languages have it better.

                      It sounds like you simply have not worked in actual large scale software development based on how much value you think you are adding by using you own randomized coding style, where every experienced developer here will almost certainly tell you that pretty much any consistent style is superior to inconsistency. Even LLVM’s wretched style guidelines are better than variable.

                      1. 2

                        If you want something different, clang-format allows you to control near as I can tell every aspect

                        Believe me, I have tried. If someone could point me at the equivalent setting for what rustfmt calls indent_style, they would be my hero.

                      1. 4

                        Semantic diffing would obviate a lot of those objections. If we have fancy tools that understand language syntax to remove whitespace around, why shouldn’t the tools that show changes also understand syntax and weed out whitespace noise? I recall seeing links here to at least two such tools lately.

                        That said, minimizing diffs is not one of my top criteria for a coding style. Ease of reading is probably the first, primarily alignment. Then there’s not wasting too much vertical space, and a sensible line length (100 IMHO) because when an editor soft-wraps lines it really messes up readability.

                        1. 2

                          True. Except git blame would still tell the same old story, and git rebase would still give you the same write amplification induced conflicts. I don’t know if that’s fixable.

                          1. 2

                            Yeah, it would be nice to have smarter diffs integrated into VCS too.

                            It sounds like Git has really advanced the state of the art in file-level delta and merge algorithms (to say nothing of other systems like Darcs), but it’s still based on treating lines as opaque atomic units … just like the original Unix diff tool from, what, 1969. At the other extreme we have binary delta tools like xdelta that view files as bags of bytes. There’s a lot of room for improvement in between…

                            1. 1

                              git blame already supports -w to ignore whitespace changes. I think technically nothing stops it from using semantic diff algorithm or formatting on the fly as a preprocessing step when computing diffs and blame.

                          1. 9

                            This seems like a misplaced rant against coworkers who don’t format things with trailing commas like the author would like, and weird tooling issues (I certainly don’t run clang-format via Docker and can’t imagine why I would), rather than much of a criticism of clang-format itself. Even the “self-evident” example is a format that clang-format can and does use! Eliminating the comma “trick” would just mean you have even less control over formatting, which contradicts the point.

                            The main reason people (myself included) like machine-dictated formatting is to reduce/eliminate time wasted on bikeshedding. If the main issue is that developers won’t place a comma to your liking, not using an automatic formatter certainly isn’t going to make anything better: that just means there are many, many more things that other developers can do that isn’t to your liking.

                            This sort of thing easily gets toxic even in corporate environments where developers are expected to deal with their coworker’s nitpicks (because they get paid at least in part to do so), and is practically a non-starter in an open source context where you simply are not going to get submissions in exactly the style you want. You can then either manually reformat it all yourself, or berate people for it, certainly just driving potential contributors away. That is the main point of automated formatting, not “it always makes the code as pretty as possible”. clang-format is popular because it’s more or less Good Enough for everyone to tolerate, not because it’s anyone’s ideal, and having a tool to do that job eliminates a ton of wasted time which can be spent on something actually important.

                            1. 1

                              I think you’re trying to misunderstand. I’m trying to address an industry-wide problem, and I don’t like bikeshedding.

                              The trailing comma is just a detail that happens to work in curly braces (which I’m glad for), but doesn’t apply anywhere else, like function arguments, so it’s not like the “self-evident” format is supported everywhere.

                              1. 2

                                To me, it sounds like you are saying: “I have strong opinions on the right way C++ code should be formatted, and I am unable to force clang-format to use it, so therefore, don’t use it.”

                            1. 7

                              I have the same issue with rustfmt. It does what people imagine gofmt doing, but not what it actually does.

                              gofmt is an error-fixing formatter. It fixes things about formatting it knows are wrong (like braces, indentation), but leaves everything else more or less unchanged (like 1-line vs multi-line choice). There are many ways to format the same construct, and gofmt respects humans’ high-level choice about the “layout” of the code.

                              Most other formatters are destructive canonicalizers, which completely replace formatting of the input with their own heuristics. The difference between these approaches is very significant, because reformatting based on heuristics doesn’t leave room for common sense, and bulldozes over formatting exceptions.

                              Unfortunately, these unforgiving blunt tools are still used due to the fallacy of “we must do something; this is something; therefore we must do this”.

                              1. 3

                                I thought gofmt was a destructive canonicalizer. So you are telling me that gofmt will not change:

                                foo
                                {
                                  bar;
                                  baz;
                                }
                                

                                into

                                foo {
                                        bar;
                                        baz;
                                }
                                

                                The former isn’t an error, just a difference in opinion.

                                1. 5

                                  gofmt has an opinion on braces when they’re on multiple lines, but it will preserve 1-liner foo {bar, baz} version as-is. This is unlike rustfmt that will splat 1-liners as high as it wants, or re-wrap multi-line constructs into long spaghetti, depending on which heuristic they hit.

                                  1. 3

                                    So unless spacing causes a compilation error, then I wouldn’t call gofmt an error-fixing formatter.

                                    1. 4

                                      I mean an error in a general sense from perspective of formatting, like using 3 spaces to indent is an erroneous formatting.

                                      1. 2

                                        No. If 3 spaces for indenting is an “error” then it should be enforced at the language level. Rob Pike wimped out in this regard.

                                        1. 5

                                          I think you’re just arguing about the meaning of the word “error”, rather than what I’ve said about formatters?

                                          I’ve used the word “error” in its non-technical English meaning of “deviation from what is correct”, and not the other meaning of “what Rob believes must stop compilation”. If that ambiguity bothers you, please read my original post with “error” replaced with “imprudent deviation”:

                                          gofmt is an imprudent-deviation-fixing formatter.

                                          1. 1

                                            If the language allows white space between tokens, why is X spaces correct, while X-n or X+n spaces incorrect? X is an arbitrary value; it’s just an opinion being enforced. If you won’t want people to have opinions on the “correct formatting” for a language, make it impossible to have said opinions at the compiler level, not with some external tool.

                                  2. 1

                                    It will change the former to the latter. And you’re right, that doesn’t represent a fixing of an error. But the whole point of the tool is to remove style choices like this from the set of things that programmers can have different opinions on! :) As the proverb goes, gofmt‘s style is nobody’s favorite, but gofmt is everybody’s favorite.

                                  3. 1

                                    Amen.

                                    I mean that rustfmt has sane heuristics. But yes, it is unfortunately a destructive bulldozer too. I was unaware that gofmt was not. Good tip!

                                  1. 2

                                    For seasoned C++ programmers, Rust is easy to learn. When they first start out, Rust learners usually spend most of their time making sense of ownership and lifetime. Even if they don’t explicitly express these concepts in code, experienced C++ engineers always keep these two concepts in mind when programming in C++. Rust can be challenging for beginners. But our interns proved otherwise. They picked up Rust within one or two weeks — even with no prior Rust/C++ expertise — because there are fewer implicit conversion and overload resolution rules to remember.

                                    Yes to all of that. I would say I’m an ownership/lifetime conscious (embedded) C++ programmer by work, and I learnt Rust in my spare time. It felt like a familiar language, I was productive in days, and my first Rust program is a successful open source project. Where was the learning curve? I’m almost disappointed. I think it depends on where you come from, but also whether what you want to do lends itself well to the borrow checker. I had read about the concepts beforehand, so I didn’t choose a project that would get me in trouble.

                                    As for fighting the compiler and having to learn new things, of course I still do: It’s not like you are an expert in C++ either after having programmed in it for 10 years, which I have.

                                    1. 4

                                      I’m all for encoding indentation as tabs instead of spaces, so the number of spaces per indentation level becomes a matter of personal preference.

                                      But if working with code is a thing you do, wrangling diffs is something that comes with it – imagine how nice it would be if only the change in indentation level was encoded in the text: Instead of encoding an absolute level of indentation, and repeating it for every line, the indentation was relative, encoded with a pair of special indentation increment/decrement characters. Then, you could indent a stretch of code without git telling you that you’ve changed every single line, which conflicts with every other change. This would not only make the diff readable (a solved problem) but actually decouple it from other changes, so you could rebase it (an unsolved problem). That, I would call a quality of life improvement. I think repurposing a couple of unused control characters is about time anyway (there are too many already).

                                      1. 3

                                        Could the same thing be achieved by changing the diff program?

                                        1. 3

                                          This leads eventually to AST diffing, and storing of AST rather than plain text, then allowing for per-preference visual layout during view and edit.

                                          1. 1

                                            I’m skeptical. AST / “semantic” diff is only necessary if you actually want it to anything else than what an (ideal) textual diff could. And until that need arises, my nullhypothesis would be that you rather don’t want it – I’m not at all convinced that people know what they want before a better textual diff has even been attempted.

                                            1. 1

                                              … which is, to my way of thinking, the Correct Answer for storage of source code, in any event.

                                            2. 1

                                              Absolutely, I think both approaches have advantages, and both should be done. I can’t believe nobody has tried either.

                                              1. 1

                                                The advantage of changing the diff program is that people who care about this can change it for themselves without affecting the format of files distributed to others. What would be the advantage of the other way?

                                                1. 1

                                                  Yes, obviously, it’s an uphill battle to change the definition of text. But it’s fundamentally redundant information – if you don’t remove it, the alternative is to support ignoring it everywhere. Remark that I’m not merely talking about any freestanding diff tool; the pressing problem for me is the rebase machinery in git.

                                                  And here’s an idea for a compression preprocessor: I would be surprised if it didn’t compress better too.

                                                  1. 1

                                                    But it’s fundamentally redundant information – if you don’t remove it, the alternative is to support ignoring it everywhere.

                                                    I’m not sure what you mean by that.

                                                    If we imagine that every tool affected by this change can be updated, then your proposal has the benefit of slightly smaller file sizes – on the order of one byte per line per level of indentation (perhaps this is the advantage you had in mind?). But adoption takes time and it will never be 100%, so you have to weigh it against the problems caused for tools that don’t support it. In this case, it would either prevent display of the files altogether, or show strange characters and no indentation.

                                                    On the other hand, changing the diff procedure would be easier to adopt because there are fewer tools that would need to be changed (still a lot). Those that don’t adopt the change would simply have the status quo, which is way better than losing the ability to view a file or its indentation.

                                                    Remark that I’m not merely talking about any freestanding diff tool; the pressing problem for me is the rebase machinery in git.

                                                    Still a much smaller change than changing everything that might display the file.

                                          1. 2

                                            Not sure what to think of the cute, ABI-specific impl there. Wouldn’t an optimizer be able to achieve that?

                                            1. 1

                                              I guess that’s part of the plan, and part of what makes it cute. For a language extension, the main thing is that it can be done and is worth extending the language for. Which I guess takes a lot of elaboration to figure out. Maybe it was so cute on Itanium because it didn’t work out on x86_64 or something?

                                              1. 5

                                                The name “Itanium ABI” is misleading. It’s an open source ABI that was originally created for Itanium, but is now used by GCC and Clang by default on all platforms, including x86_64.

                                                https://itanium-cxx-abi.github.io/cxx-abi/abi.html

                                            1. 20

                                              My view on this is that Rust and C++ aren’t that much different in their evolution, it’s just that Rust came later.

                                              A lot of C++ language features came out of necessity or common culture: when C++ came up, object orientation was coming up as the way to modularise software. The knowledge about practical issues of the concept came up later.

                                              The other thing is that static languages, especially with an eye for performance, have the tendency to grow features - one day, it’s an abstraction over SIMD, tomorrow is CHERI. So you end up in a situation where the language is ever extended and some features stop being “en vogue”.

                                              And we’re seeing this in Rust: it’s a perfectly usable programming language, still, there’s work on features to e.g. avoid Boxing return types in async. For most programming languages, that would be no problem and if you want to improve the situation, you try to fix it in the runtime (e.g. through a clever JIT).

                                              Even as someone who is deep into Rust, my view is that in 20 years, we’ll probably read this blog post with Rust vs. some other language.

                                              C++ has been breaking ground and it’s important to see its evolution as something that did involve the thoughts of many contributors.

                                              1. 3

                                                Glad to see a balanced take. I’ll write in Rust vs C++ any day, but I really dislike seeing the Comment Section blasting C++ for complexity when Rust has enough complexity of its’ own. Rust, at least, actively tries to file that down, but the space these languages aim for pretty much necessitates it.

                                                And there are some things that you cannot express easily in Rust, such as a precise GC that doesn’t require explicit rooting. (It is possible someone could figure out a way to do this, and there are lots of good work here!) However, I freely concede they are a small, small subset of the types of things you’d want to write, generally.

                                                1. 4

                                                  Ha, thanks, such feedback means much to me! I’m very interested why languages are like they are and my conclusion is that the explanation that takes their authors, creators and communities very seriously often leads to the most coherent results.

                                                  I’ve also spent my time around language creators (previously in Ruby, but also at conferences like ACCU) and I’m very annoyed how they usually have an intimate understanding of the problems everyone else in the space is sharing. Curiosity rules there. When I joined the Rust community, one of the first things I read was “If Rust is a criticism of C++, it comes from a position of respect”. And that was certainly true.

                                                  The point that not everything is easy to expressed in Rust is very true - and sometimes, that’s a choice. Mostly on focus - other things need more work at the moment. My personal thing is that I would not write a (large) UI in Rust at the moment. Maybe a window or 2, but not a large, complex system.

                                                  I hope in general, our community gets more curious, or at least calm.

                                                  One last thing: the late Russel Winder sent me emails for 3 years to speak at ACCU about Rust. When I arrived there, he caught me and we had a very short conversation. He stated that he’s happy to see that systems programming is becoming a hot topic again and he sees the future of ACCU as a conference and meeting spot for systems programmers. And he introduced me to 1-2 C++ hotshots and told me to not feel small. And gosh, did I have great conversations! So even if most talks were about C++, there was Swift, Go, Rust, Zig etc. at this conference. And how is that not a privilege - we’re out here, can compare implementations, look at the source, express our thoughts and ideas and… how can we choose dismissal?

                                                  Okay, enough rambling :).

                                                  1. 1

                                                    blasting C++ for complexity when Rust has enough complexity of its own. Rust, at least, actively tries to file that down, but the space these languages aim for pretty much necessitates it.

                                                    I agree on a basic level, but I think there generally is a difference between complexity and complexity.

                                                    For one aspect, I think zero-cost abstractions in particular is what this space demands. Zero-cost abstractions are forever, whereas other abstractions gradually get replaced by better and better ones until one or typically many zero-cost abstractions are reached.

                                                1. 5

                                                  For those looking for performance, some hefty alternatives have popped up lately:

                                                  Especially the RK3588 SoC looks promising for a server, as it has UEFI and lots of storage.

                                                  1. 2

                                                    That does look cool, esp the cheaper model. And an even cheaper one has popped up for a low low $169: https://www.cnx-software.com/2022/05/12/mekotronics-r58-is-a-cost-optimized-rockchip-rk3588-sbc/

                                                  1. 6

                                                    There are multiple points here I disagree with:

                                                    1. Go’s and Zig’s defer are rather different beasts Go runs defered statements at the end of the function, Zig at the end of scope. ant to lock a mutex insife a loop? Can’t use Go defer for that..
                                                    2. destructors can’t take arguments or return values While most destructions only release acquired resources, passing an argument to a defered call can be very useful in many cases
                                                    3. hidden code all defer code is visible in the scope. Look for all lines starting with defer in the current scope and you have all the calls. Looking for destructors means looking how drop is implemented for all the types in the scopes.
                                                    1. 11

                                                      Go’s and Zig’s defer are rather different beasts Go runs defered statements at the end of the function, Zig at the end of scope. ant to lock a mutex insife a loop? Can’t use Go defer for that..

                                                      This distinction doesn’t really matter in a language with first-class lambdas. If you want to unlock a mutex at the end of a loop iteration with Go, create and call a lambda in the loop that uses defer internally.

                                                      destructors can’t take arguments or return values

                                                      But constructors can. If you implement a Defer class to use RAII, it takes a lambda in the constructor and calls it in the destructor.

                                                      hidden code all defer code is visible in the scope

                                                      I’m not sure I buy that argument, given that the code in defer is almost always calling another function. The code inside the constructor for the object whose cleanup you are defering is also not visible in the calling function.

                                                      1. 4

                                                        hidden code all defer code is visible in the scope

                                                        I’m not sure I buy that argument, given that the code in defer is almost always calling another function. The code inside the constructor for the object whose cleanup you are defering is also not visible in the calling function.

                                                        The point is that as a reader of zig, you can look at the function and see all the code which can be executed. You can see the call and breakpoint that line. As a reader of c++, it’s a bit more convoluted to breakpoint on destructors.

                                                        1. 2

                                                          you can look at the function and see all the code which can be executed.

                                                          As someone that works daily with several hundred lines functions, that sounds like a con way more than a pro.

                                                        2. 1

                                                          But constructors can.

                                                          This can work sometimes, but other times packing pointers in a struct just so you can drop it later is wasteful. This happens a lot with for example the Vulkan API where a lot of the vkDestroy* functions take multiple arguments. I’m a big fan of RAII but it’s not strictly better.

                                                          1. 1

                                                            At least in C++, most of this all goes away after inlining. First the constructor and destructor are both inlined in the enclosing scope. This turns the capture of the arguments in the constructor into local assignments in a structure in the current stack frame. Then scalar replacement of aggregates runs and splits the structure into individual allocas in the first phase and then into SSA values in the second. At this point, the ‘captured’ values are just propagated directly into the code from the destructor.

                                                          2. 1

                                                            If you want to unlock a mutex at the end of a loop iteration with Go, create and call a lambda in the loop that uses defer internally.

                                                            Note that Go uses function scope for defer. So this will actually acquire locks slowly then release them all at the end of function. This is very likely not what you want and can even risk deadlocks.

                                                            1. 1

                                                              Is a lambda not a function in Go? I wouldn’t expect defer in a lambda to release the lock at the end of the enclosing scope, because what happens if the lambda outlives the function?

                                                              1. 1

                                                                Sorry, I misread what you said. I was thinking defer func() { ... }() not func() { defer ... }().

                                                                1. 2

                                                                  Sorry, I should have put some code - it’s much clearer what I meant from your post.

                                                          3. 5

                                                            The first point is minor, and not really changing the overall picture of leaking by default.

                                                            Destruction with arguments is sometimes useful indeed, but there are workarounds. Sometimes you can take arguments when constructing the object. In the worst case you can require an explicit function call to drop with arguments (just like defer does), but still use the default drop to either catch bugs (log or panic when the right drop has been forgotten) or provide a sensible default, e.g. delete a temporary file if temp_file.keep() hasn’t been called.

                                                            Automatic drop code is indeed implicit and can’t be grepped for, but you have to consider the trade-off: a forgotten defer is also invisible and can’t be grepped for either. This is the change in default: by default there may be drop code you may not be aware of, instead of by default there may be a leak you may not be aware of.

                                                            1. 3

                                                              destructors can’t take arguments or return values. While most destructions only release acquired resources, passing an argument to a deferred call can be very useful in many cases.

                                                              Yes, more than useful:

                                                              • Zero-cost abstraction in terms of state: A deferred call doesn’t artificially require objects to contain all state needed by their destructors. State is generally bad, especially references, and especially long lived objects that secretly know about each other.
                                                              • Dependencies are better when they are explicit: If one function needs to run before another, letting it show (in terms of what arguments they require) is a good thing: It makes wrong code look wrong (yes, destruction order is a common problem in C++) and prevents it from compiling if you have lifetimes like Rust.
                                                              • Expressiveness: In the harsh reality we live in, destructors can fail.

                                                              I think the right solution is explicit destructors: Instead of the compiler inserting invisible destructor calls, the compiler fails if you don’t. This would be a natural extension to an explicit language like C – it would only add safety. Not only that: It fits well with defer too – syntactic sugar doesn’t matter, because it just solves the «wrong default» problem. But more than anything, I think it would shine in a language with lifetimes, like Rust, where long lived references are precisely what you don’t want to mess with.

                                                              1. 2

                                                                You could run an anonymous function within a loop in Go, just to get the per-loop defer. Returning a value in a defer is also possible.

                                                                func main() (retval int) {
                                                                    for {
                                                                        func() {
                                                                            // do stuff per loop
                                                                            defer func() {
                                                                                // per loop cleanup
                                                                            }()
                                                                        }()
                                                                    }
                                                                    defer func() {
                                                                        retval = 42
                                                                    }()
                                                                    return
                                                                }
                                                                
                                                              1. 12

                                                                Sadly, it’s just the in-kernel bits; the userspace blobs are all still proprietary.

                                                                1. 7

                                                                  It also somehow doesn’t cover actually using your GPU to display graphics on a screen, so don’t get your hopes up.

                                                                  https://blogs.gnome.org/uraeus/2022/05/11/why-is-the-open-source-driver-release-from-nvidia-so-important-for-linux/

                                                                  1. 1

                                                                    For now. If this is related to the previous hack, it wouldn’t be surprising if at least a subset of userspace bits follow down the line.

                                                                    1. 6

                                                                      I would not expect the userland to be released ever. It is the nvidia’s secret sauce doing all of the heavy lifting of implementing opengl/directx/vulkan and fixing various apps’ mistakes.

                                                                      I also don’t think this release is related to the hack in more than timing coincidence. Grapevine says they’ve wanted to release kernel bits for long time but red tape was there.

                                                                      1. 1

                                                                        It’s very not related. Any leaked materials from them would be legally extremely toxic. AMD would not allow anyone employed to read them, and anyone trying to do independent development based on it would get sued by NVIDIA. Even reactos had a “can’t contribute if you’ve read the leaked windows source” rule.

                                                                        1. 1

                                                                          Never claimed it was leaked. It’s obviously not. I’m suggesting that this may be the result of negotiation with the hackers.

                                                                          1. 2

                                                                            No, I get it. I’m saying NVIDIA has no business negotiating. Leaks wouldn’t really hurt it and complying wouldn’t guarantee anything.

                                                                    2. 1

                                                                      Was that the catch? If there had to be one, this actually makes me very relieved, happy and carefully positive to Nvidia again. Because in terms of what to fix first, the kernel module was always the problem, as far as I understand.

                                                                      As a desktop user, all I want is to not have driver problems. Most importantly, to not have my desktop replaced with text on a black screen ever again – the single reason I have avoided Nvidia for a decade now. This used to happen every kernel upgrade.

                                                                    1. 2

                                                                      No parallel benchmarks? What good is it if you are fast on a single core when most users have multi-core systems? FWIW, i’ve made very good experience with plzip (repology).

                                                                      1. 3

                                                                        For an algorithm, it’s essentially irrelevant: Compression is inherently a serial problem, but any algorithm can be parallelized by splitting the problem first. Unless I’ve missed something, that’s what they all do.

                                                                        The nature of compression is that the more you divide the problem, the less compression you can have. I wouldn’t even count that as a parallel algorithm. An actual parallel compression algorithm would require another way to parallelism. The most viable I can think of:

                                                                        • Parallel search makes sense in SIMD and FPGA, but CPU threads, probably no: CPU threads aren’t suitable to sync that tightly. Maybe it would make sense at a compression level that’s already slow enough or if they busywait for each other (very energy wasteful).
                                                                        • Pipelining doesn’t scale, because you can only cut the dataflow in so many suitable places, usually zero. Maybe a two-threaded compression algorithm, one for prediction and one for entropy coding, might be viable. Someone needs to try that.

                                                                        Edit: I would agree 100% if I had to use the tool in its current form, but that’s fixable.

                                                                        1. 3

                                                                          Actually the zstd implementors did a lot of work in getting good benefit of single-core parallelism by careful design and implementation choices. See for example this blog post on the blog of the author of ztsd. (This blog post is more general, but I believe the knowledge gathered from those experiments were integrated into “fse”, which is a building block of zstd.)

                                                                          1. 2

                                                                            I probably did not make my point very clear. Sorry about that. Users don’t use the algorithm, but the application. And they often want to compress large directory trees. Then why should I use bzip3, which does not appear to support parallel processing, when there are tools available, like plzip or pbzip2, that perform the task in a fraction of the time? If your compression tool is only single-threaded, then algorithmic superiority does not really matter.

                                                                            1. 6

                                                                              If you’re compressing a bunch of independent blobs, you just call the compression API on multiple threads.

                                                                              If you’re compressing blobs that are related, where you want the commonalities between blobs to improve compression — say, a bunch of JSON files with the same schema — you don’t have a well-parallelizable task, as @anordal pointed out. Each blob being compressed affects the state used for the next one.

                                                                              1. 1

                                                                                Because someone could write the parallel tool for bzip3 in about 37 seconds, given that the algorithm exists.

                                                                                1. 1

                                                                                  I also assume that a naive parallelisation would be trivial to write. I believe that makes it worse that there are no performance numbers of an parallel execution shown.

                                                                          1. 4

                                                                            No critique, but I find it funny that the «Top 10 commands» are mostly deprecated:

                                                                            1. cron: insert comment about the scope creep of systemd
                                                                            2. traceroute: Ok, I don’t know of a better traceroute.
                                                                            3. tar: Why remember tar when atool works on any archive and is safe against tarbombs.
                                                                            4. crontab: You don’t want to edit a distro-provided config file. For most purposes, it suffices to drop a script in a directory like /etc/cron.daily/.
                                                                            5. netstat: Superseded by ss and ip.
                                                                            6. cp: Rsync is in many ways a better cp. Wanna sync two directories? Unlike if you try this with cp, rsync -r work/ backup/ will do the same irrespective of whether backup/ exists yet.
                                                                            7. ls: Its date format is wrong. For locales where the day is supposed to come before the month, it makes you read the filesize as the day number – what you read as «0th of may» (in English) means that the file is empty. Exa gets it right.
                                                                            8. iptables: Superseded by nftables
                                                                            9. curl: Ok, there is wget too, but they aren’t in a particular need of replacement.
                                                                            10. chmod: Ok, not exactly a swiss knife, but it does its one syscall right. Access control lists and mandatory access control is not replacing it any time soon.
                                                                            1. 5

                                                                              cron: insert comment about the scope creep of systemd

                                                                              systemd-timers provide way more functionality than crond. It’s a valid improvement, not just a simple replacement.

                                                                              traceroute

                                                                              mtr, tracepath

                                                                              1. 4

                                                                                re: better traceroute, I think that’s mtr

                                                                              1. 26

                                                                                C++ is also a very complex language. Sure, it may be easier to write than Rust because the compiler is more forgiving, but it’s also much harder to guarantee its correctness.

                                                                                Agree. Depending on what you care about, the tables turn completely, and it’s not limited to writing:

                                                                                It occured to me today that I can’t actually read C++: At least when correctness means not creating accidental copies of a vector that’s wrapped in a generated class that doesn’t implement move operators, and passed up through function calls involving both RVO with multiple returns and assignment to out-arguments. The only way to inquire about whether that vector is accidentally copied is to replace it with your own that is uncopyable. Which can be a lot of work if that involves changing a pervasively used code generator. Suffice to say, the code wasn’t readable in a day.

                                                                                In contrast, Rust doesn’t implicitly .clone() things.

                                                                                1. 5

                                                                                  I forgot to say that in Rust, you wouldn’t even write this (particular) code yourself, because Serde exists. Thanks to more powerful macros, things like Serde are possible without code generation.

                                                                                  (This wasn’t my main point, but perhaps a good one about how laborious it is to get the same job done if your language is not expressive enough.)

                                                                                1. 7

                                                                                  A modern C++ CI system will also catch a lot of these, so using Go as the baseline feels a bit mean:

                                                                                  • Resource leaks: RAII for all resources, warn on explicit mutex lock rather than things like std::unique_lock / std::lock_guard (one is lexically scoped, the other allows the lock ownership to be transferred). clang-tidy and friends can warn about many of these.
                                                                                  • Unreleased mutexes are a special case of RAII
                                                                                  • Missing switch cases are compiler warnings.
                                                                                  • Option types. Rust is definitely better than C++ here. You can use std::optional and, I think, most implementations special case pointers so that they compile down to using nullptr as the not-present representation, but require explicit checks for access (though you’re dependent on something like the clang static analyser to notice things where you’re using the direct accessor without checking for validity first, rather than the value_or accessor). This is very much ‘you won’t see this in C++ code written with a good style guide’ vs ‘you can’t express this at all in Rust’.
                                                                                  • Uninitialised local variables will warn but uninitialised struct fields are particularly painful in C++ and there are lots of ways of writing them that look correct but aren’t.
                                                                                  • Unhandled explicit errors are not too much of a problem ([[nodiscard]] requires you to do something with the return, LLVM has some error templates that abort if you don’t check the error even on non-error results, though catching this at compile time is nicer than failing in the first test that uses the functionality). Unchecked exceptions are a disaster and are a big part of the reason why I prefer -fno-exceptions for C++ code. Win for Rust here.
                                                                                  • Data races is a fun one because Rust’s Sync trait can be implemented in safe Rust only for immutable objects. Any shared mutable state requires unsafe, including standard traits such as ARC. This, in turn, means that you’re reliant on this unsafe code being correct when composed with all other unsafe code in the same program. Still a nicer place to be than C++ though.
                                                                                  • I’m not sure I understand the Hidden Streams argument so I can’t comment on it.

                                                                                  Rust definitely has some wins relative to modern C++. When we looked at this, our conclusion was:

                                                                                  • Rust and modern C++ with static analysis required for CI prevent very similar levels of bugs.
                                                                                  • C++ is often better in places where you’re doing intrinsically unsafe things (e.g. a memory allocator) because the Rust analysis tooling is much better for the safe subset of the language than the unsafe subset, whereas all C++ analysis tools are targeted at unsafe things.
                                                                                  • Preventing developers from committing code that doesn’t compile to a repo is orders of magnitude easier than preventing them from deciding that they know better than a static analyser and marking real bugs as analyser false positives.

                                                                                  The last one of these is really critical. C++ code can avoid these bugs, Rust code must avoid them unless you use unsafe. We assumed that avoiding unsafe was something code review would easily handle, though since I’ve learned that Facebook has millions of lines of unsafe Rust code I’m now far less confident in that claim.

                                                                                  1. 2

                                                                                    The last one of these is really critical. C++ code can avoid these bugs, Rust code must avoid them unless you use unsafe. We assumed that avoiding unsafe was something code review would easily handle, though since I’ve learned that Facebook has millions of lines of unsafe Rust code I’m now far less confident in that claim.

                                                                                    This is what makes -Wall -Werror an attractive nuisance of sorts. Unfortunately, the projects I’ve worked on that would benefit most from those also had enough spurious warnings triggered by headers for dependencies that it was never practical to leave them enabled.

                                                                                    1. 1

                                                                                      Tip: Use -isystem to include dependencies. Tells the compiler that it’s not your code.

                                                                                    2. 2

                                                                                      You can use std::optional and, I think, most implementations special case pointers so that they compile down to using nullptr as the not-present representation.

                                                                                      You mean references, not pointers, right? If this were done for pointers, then the “present NULL” and “absent” states would be indistinguishable.

                                                                                      1. 2

                                                                                        You mean references, not pointers, right?

                                                                                        I meant pointers, but you’ve made me realise that I’m probably wrong. std::optional is not defined for T&.

                                                                                        If this were done for pointers, then the “present NULL” and “absent” states would be indistinguishable.

                                                                                        I had always assumed that std::optional<T*> x{nullptr} would give a not-present value but a quick test suggests that this is not the case. I’ve learned something today!

                                                                                      2. 1

                                                                                        is the nullpointer optimization for optional pointers legal in C++? It seems like since there’s no idea of nonnullable pointers, you can’t make the optimization safely since you should be able to distinguish between “no value” and the nullptr. Also the thing with Rust is that when you’re implementing data structures, which is something that facebook will do a lot throughout their codebase, you often do need unsafe. The point is that you can wrap these unsafe blocks in safer abstractions and then the calling code needs less review. That’s not to say unsound code never pops up, but having unsafe operations constrained to specific places is still really helpful.

                                                                                      1. 3

                                                                                        In Swift, this would be:

                                                                                        struct Test {
                                                                                            let `in`: String
                                                                                        }
                                                                                        
                                                                                        let a = Test(in: "asdf")
                                                                                        

                                                                                        I think the example doesn’t really show the point of r#. You could just as well just change the name to _in or __in instead and it would probably be more readable than r#in.

                                                                                        1. 3

                                                                                          As always, the RFC gives a lot of motivation. https://rust-lang.github.io/rfcs/2151-raw-identifiers.html

                                                                                          (E.g. the ability to name a function like a keyword, particularly useful for FFI use)

                                                                                          1. 2

                                                                                            But if you always call it using r#, you have essentially renamed the function. It would be acceptable if it was only at declaration or where disambiguation was otherwise needed, but here it seems to surface at every point of use.

                                                                                            1. 4

                                                                                              Imagine function:

                                                                                              extern "C" r#match() {
                                                                                              
                                                                                              }
                                                                                              

                                                                                              bc. a dynamic library needs to export this symbol. I agree in general, r# is not to be used in interfaces intended for humans.

                                                                                              1. 6

                                                                                                Hm, I don’t think that’s what is happening here. For FFI purposes, we have a dedicated attribute, link_name

                                                                                                https://doc.rust-lang.org/reference/items/external-blocks.html#the-link_name-attribute

                                                                                                Unlike r#, it’s not restricted to valid rust identifiers (ie, it allows weird symbols in name).

                                                                                                My understanding that 90% of the motivation for r# was edition system, and desire to re-purpose existing idents as keywords. Hence, unlike Swift or Kotlin, Rust deliberately doesn’t support arbitrary strings as raw identifiers, only stuff which is lexically an ident ((_|XID_Start)XID_Continue*).

                                                                                          2. 3

                                                                                            The example uses debug serialisation (#[derive(Debug)]), which perhaps isn’t the best example of why it matters, but at least proves the point.

                                                                                            The name matters in serialisation, and this could be generated code. I’ve had this exact problem in two unrelated protocol generators that happened to generate C++, and got funny build errors when I tried to define messages with fields like delete and static.

                                                                                            1. 1

                                                                                              OK, but that option hasn’t gone anywhere. You can still name it _in if you want. There’s plenty of niche cases where it would be nice to keep the identifier, mostly when interfacing with code you don’t control.

                                                                                              1. 1

                                                                                                Yes, exactly, I figured out about raw identifier while checking a PR at sqlparser crate. where author used in for parsing one of the statements.

                                                                                            1. 3

                                                                                              This suffers from the same issue as most “new shell” languages: it doesn’t have a good interactive story. The progress from typing on CLI to pasting together a script is core to why shell scripting is interesting.

                                                                                              So far Xonsh has been something that tries to address both sides. I’m no python fan, but it feels like the right direction.

                                                                                              1. 1

                                                                                                Obviously different people have different habits and needs, but I don’t need a shell language to have an interactive story at all. When I write shell scripts, I don’t start by typing into an interactive terminal and then move to a script. For me, the start is almost always “I need to run these (5, 10, 50) commands on a whole bunch of files (or directories).” Then I think, “Go is definitely too much, and I don’t want to fiddle with the plumbing required in Python to execute commands, capture stdout and stderr, etc.” For me, a shell scripting language with reasonable defaults and straightforward, simple syntax that makes it easy to run commands is a dream, regardless of whether it has an interactive story or not.

                                                                                                1. 1

                                                                                                  I completely agree as an end goal (I’ve written countless 10+-liners at the prompt just because fish made it easy), but what’s fixable doesn’t doom the language. So I think it’s fine to focus on the language before adding interactivity.

                                                                                                  Related: What really needs replacing is not so much bash, but those “minimal POSIX shells” like dash and busybox ash that don’t even support arrays, which makes anything of substance impossible to write correctly. So I would like to see a really minimal but still correctness-conducive language that could actually succeed on embedded. Here, interactive features are less important. I’m glad someone else did it, so I didn’t have to.

                                                                                                1. 11

                                                                                                  As someone who is rather new to languages like C (I only recently got into it by making a game with it), I have a few newbie questions:

                                                                                                  • Why do people want to replace C? Security reasons, or just old and outdated?

                                                                                                  • What does Hare offer over C? They say that Hare is simpler than C, but I don’t understand exactly how. Same with Zig. Do they compile to C in the end, and these languages just make it easier for user to write code?

                                                                                                  That being said, I find it cool to see these languages popping up.

                                                                                                  1. 33

                                                                                                    Why do people want to replace C? Security reasons, or just old and outdated?

                                                                                                    • #include <foo.h> includes all functions/constants into the current namespace, so you have no idea what module a function came from
                                                                                                    • C’s macro system is very, very error prone and very easily abused, since it’s basically a glorified search-and-replace system that has no way to warn you of mistakes.
                                                                                                    • There are no methods for structs, you basically create struct Foo and then have to name all the methods of that struct foo_do_stuff (instead of doing foo_var.do_stuff() like in other languages)
                                                                                                    • C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).
                                                                                                    • C’s standard library is really tiny, so you end up creating your own in the process, which you end up carrying around from project to project.
                                                                                                    • C’s standard library isn’t really standard, a lot of stuff isn’t consistent across OS’s. (I have agreeable memories of that time I tried to get a simple 3kloc project from Linux running on Windows. The amount of hoops you have to jump through, tearing out functions that are Linux-only and replacing them with an ifdef mess to call Windows-only functions if you’re on compiling on Windows and the Linux versions otherwise…)
                                                                                                    • C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.
                                                                                                    • C has no anonymous functions. (Whether this matters really depends on your coding style.)
                                                                                                    • Manual memory management without defer is a PITA and error-prone.
                                                                                                    • Weird integer type system. long long, int, short, etc which have different bit widths on different arches/platforms. (Most C projects I know import stdint.h to get uint32_t and friends, or just have a typedef mess to use usize, u32, u16, etc.)

                                                                                                    EDIT: As Forty-Bot noted, one of the biggest issues are null-terminated strings.

                                                                                                    I could go on and on forever.

                                                                                                    What does Hare offer over C?

                                                                                                    It fixes a lot of the issues I mentioned earlier, as well as reducing footguns and implementation-defined behavior in general. See my blog post for a list.

                                                                                                    They say that Hare is simpler than C, but I don’t understand exactly how.

                                                                                                    It’s simpler than C because it comes without all the cruft and compromises that C has built up over the past 50 years. Additionally, it’s easier to code in Hare because, well, the language isn’t trying to screw you up every 10 lines. :^)

                                                                                                    Same with Zig. Do they compile to C in the end, and these languages just make it easier for user to write code?

                                                                                                    Zig and Hare both occupy the same niche as C (i.e., low-level manual memory managed systems language); they both compile to machine code. And yes, they make it a lot easier to write code.

                                                                                                    1. 15

                                                                                                      Thanks for the great reply, learned a lot! Gotta say I am way more interested in Hare and Zig now than I was before.

                                                                                                      Hopefully they gain traction. :)

                                                                                                      1. 15

                                                                                                        #include <foo.h> includes all functions/constants into the current namespace, so you have no idea what module a function came from

                                                                                                        This and your later point about not being able to associate methods with struct definitions are variations on the same point but it’s worth repeating: C has no mechanism for isolating namespaces. A C function is either static (confined to a single compilation unit) or completely global. Most shared library systems also give you a package-local form but anything that you’re exporting goes in a single flat namespace. This is also true of type and macro definitions. This is terrible for software engineering. Two libraries can easily define different macros with the same name and break compilation units that want to use both.

                                                                                                        C++, at least, gives you namespaces for everything except macros.

                                                                                                        C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).

                                                                                                        The lack of type checking is really important here. A systems programming language is used to implement the most critical bits of the system. Type checks are incredibly important here, casting everything via void* has been the source of vast numbers of security vulnerabilities in C codebases. C++ templates avoid this.

                                                                                                        C’s standard library is really tiny, so you end up creating your own in the process, which you end up carrying around from project to project.

                                                                                                        This is less of an issue for systems programming, where a large standard library is also a problem because it implies dependencies on large features in the environment. In an embedded system or a kernel, I don’t want a standard library with file I/O. Actually, for most cloud programming I’d like a standard library that doesn’t assume the existence of a local filesystem as well. A bigger problem is that the library is not modular and layered. Rust’s nostd is a good step in the right direction here.

                                                                                                        C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.

                                                                                                        From libc, most errors are not returned, they’re signalled via the return and then stored in a global (now a thread-local) variable called errno. Yay. Option types for returns are really important for maintainable systems programming. C++ now has std::optional and std::variant in the standard library, other languages have union types as first-class citizens.

                                                                                                        Manual memory management without defer is a PITA and error-prone.

                                                                                                        defer isn’t great either because it doesn’t allow ownership transfer. You really need smart pointer types and then you hit the limitations of the C type system again (see: no generics, above). C++ and Rust both have a type system that can express smart pointers.

                                                                                                        C has no anonymous functions. (Whether this matters really depends on your coding style.)

                                                                                                        Anonymous functions are only really useful if they can capture things from the surrounding environment. That is only really useful in a language without GC if you have a notion of owning pointers that can manage the capture. A language with smart pointers allows you to implement this, C does not.

                                                                                                        1. 6

                                                                                                          defer isn’t great either because it doesn’t allow ownership transfer. You really need smart pointer types and then you hit the limitations of the C type system again (see: no generics, above). C++ and Rust both have a type system that can express smart pointers.

                                                                                                          True. I’m more saying that defer is the baseline here; without it you need cleanup: labels, gotos, and synchronized function returns. It can get ugly fast.

                                                                                                          Anonymous functions are only really useful if they can capture things from the surrounding environment. That is only really useful in a language without GC if you have a notion of owning pointers that can manage the capture. A language with smart pointers allows you to implement this, C does not.

                                                                                                          I disagree, depends on what you’re doing. I’m doing a roguelike in Zig right now, and I use anonymous functions quite extensively for item/weapon/armor/etc triggers, i.e., where each game object has some unique anonymous functions tied to the object’s fields and can be called on certain events. Having closures would be nice, but honestly in this use-case I didn’t really feel much of a need for it.

                                                                                                        2. 3

                                                                                                          Note that C does have “standard” answers to a lot of these.

                                                                                                          C’s macro system is very, very error prone and very easily abused, since it’s basically a glorified search-and-replace system that has no way to warn you of mistakes.

                                                                                                          The macro system is the #1 thing keeping C alive :)

                                                                                                          There are no methods for structs, you basically create struct Foo and then have to name all the methods of that struct foo_do_stuff (instead of doing foo_var.do_stuff() like in other languages)

                                                                                                          Aside from macro stuff, the typical way to address this is to use a struct of function pointers. So you’d create a wrapper like

                                                                                                          do_stuff(struct *foo)
                                                                                                          {
                                                                                                              foo->do_stuff(foo);
                                                                                                          }
                                                                                                          

                                                                                                          C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).

                                                                                                          Note that typically there is a “base class” which either all “subclasses” include as a member (and use offsetof to recover the subclass) or have a void * private data pointer. This doesn’t really escape the problem, however in practice I’ve never run into a bug where the wrong struct/method gets combined. This is because the above pattern ensures that the correct method gets called.

                                                                                                          C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.

                                                                                                          Well, there’s always errno… And if you control the address space you can always use the upper few addresses for error codes. That said, better syntax for multiple return values would probably go a long way.

                                                                                                          C has no anonymous functions. (Whether this matters really depends on your coding style.)

                                                                                                          IIRC gcc has them, but they require executable stacks :)

                                                                                                          Manual memory management without defer is a PITA and error-prone.

                                                                                                          Agree. I think you can do this with GCC extensions, but some sugar here would be nice.

                                                                                                          Weird integer type system. long long, int, short, etc which have different bit widths on different arches/platforms. (Most C projects I know import stdint.h to get uint32_t and friends, or just have a typedef mess to use usize, u32, u16, etc.)

                                                                                                          Arguably there should be fixed width types, size_t, intptr_t, and regsize_t. Unfortunately, C lacks the last one, which is typically assumed to be long. Rust, for example, gets this even more wrong and lacks the last two (c.f. the recent post on 129-bit pointers).


                                                                                                          IMO you missed the most important part, which is that C strings are (by-and-large) nul-terminated. Having better syntax for carrying a length around with a pointer would go a long way to making string support better.

                                                                                                        3. 9

                                                                                                          Even in C’s domain, where C lacks nothing and is fine for what it is, I would criticize C for maybe 5 things, which I would consider the real criticism:

                                                                                                          1. It has undefined behaviour, of the kind that has come to mean that the compiler may disobey the source code. It turns working code into broken code just by switching compiler or inlining some code that wasn’t inlined before. You can’t necessarily point at a piece of code and say it was always broken, because UB is a runtime phenomenon. Not reassuring for a supposedly lowlevel language.
                                                                                                          2. Its operator precedence is wrong.
                                                                                                          3. Integer promotion. Just why.
                                                                                                          4. Signedness propagates the wrong way: Instead of the default type being signed (int) and comparison between signed and unsigned yielding unsigned, it should be opposite: There should be a nat type (for natural number, effectively size_t), and comparison between signed and unsigned should yield signed.
                                                                                                          5. char is signed. Nobody likes negative code points.
                                                                                                          1. 6

                                                                                                            the kind that has come to mean that the compiler may disobey the source code. It turns working code into broken code

                                                                                                            I’m wary of this same tired argument cropping up again, so I’ll just state it this way: I disagree. Code that invokes undefined behavior is already broken; changing compiler can’t (except perhaps in very particular circumstances, which I don’t think you were referring to) introduce undefined behaviour; it can change the observable behaviour when UB is invoked.

                                                                                                            A compiler can’t “disobey the source code” whilst conforming to the language standard. If the source code does something that doesn’t have defined semantics, that’s on the source code, not the compiler.

                                                                                                            “It’s easy to accidentally invoke undefined behaviour in C” is a valid criticism, but “C compilers breaks code” is not.

                                                                                                            You can’t necessarily point at a piece of code and say it was always broken

                                                                                                            You certainly can in some instances. But sure, for example, if some piece of code dereferences a pointer and the value is set somewhere else, it could be undefined or not depending on whether the pointer is valid at the point it is dereferenced. So code might be “not broken” given certain constraints (eg that the pointer is valid), but not work properly if those constraints are violated, just like code in any language (although in C there’s a good chance the end result is UB, which is potentially more catastrophic).

                                                                                                            I’m not saying C is a good language, just that I think this particular criticism is unfair. (Also I think your point 5 is wrong, char can be unsigned, it’s up to the implementation).

                                                                                                            1. 7

                                                                                                              Thing is, it certainly feels like the compiler is disobeying the source code. Signed integer overflow? No problem pal, this is x86, that platform will wrap around just fine! Right? Riiight? Oops, nope, and since the compiler pretends UB does not exist, it just deleted a security check that it deemed “dead code”, and now my hard drive has been encrypted by a ransomware that just exploited my vulnerability.

                                                                                                              Though I agree with all the facts you laid out, and with the interpretation that UB means the program is already broken even if the generated binary didn’t propagate the error. But Chandler Carruth pretending that UB does not invoke the nasal demons is not far. Let’s not forget that UB means the compiler is allowed to cause your entire hard drive to be formatted, as ridiculous as it may sound. And sometimes it actually happens (as it did so many times with buffer overflow exploits).

                                                                                                              Sure, it’s not like the compiler is actually disobeying your source code. But since UB means “all bets are off”, and UB is not always easy to catch, the result is pretty close.

                                                                                                              1. 3

                                                                                                                Sure, it’s not like the compiler is actually disobeying your source code. But since UB means “all bets are off”, and UB is not always easy to catch, the result is pretty close.

                                                                                                                I feel like “disobeying the code” and “not doing what I intended it to do due to the code being wrong” are still two sufficiently different things that it’s worth distinguishing.

                                                                                                                1. 4

                                                                                                                  Okay, it is worth distinguishing.

                                                                                                                  But it is also worth noting that C is quite special. This UB business repeatedly violates the principle of least astonishement. Especially the modern interpretation, where compilers systematically assume UB does not exist and any code path that hits UB is considered “dead code”.

                                                                                                                  The original intent of UB was much closer to implementation defined behaviour. Signed integer overflow was originally UB because some platforms crashed or otherwise went bananas when it occurred. But the expectation was that on platforms that behave reasonably (like x86, that wraps around), we’d get the reasonable behaviour. But then compiler writers (or should I say their lawyers) noticed that strictly speaking, the standard didn’t made that expectation explicit, and in the name of optimisation started to invoke nasal demons even on platforms that could have done the right thing.

                                                                                                                  Sure the code is wrong. In many cases though, the standard is also wrong.

                                                                                                                  1. 4

                                                                                                                    I agree with some things but not others that you say, but these arguments have been hashed out many times before.

                                                                                                                    Sure the code is wrong

                                                                                                                    That’s the point I was making. Since we agree on that, and we agree that there are valid criticisms of C as a language (though we may differ on the specifics of those), let’s leave the rest. Peace.

                                                                                                              2. 4

                                                                                                                But why not have the compiler reject the code instead of silently compiling it wrong?

                                                                                                                1. 2

                                                                                                                  It doesn’t compile it wrong. Code with no semantics can’t be compiled incorrectly. You’re making the exact same misrepresentation as in the post above that I responded to originally.

                                                                                                                  1. 3

                                                                                                                    Code with no semantics shouldn’t be able to be compiled at all.

                                                                                                                    1. 1

                                                                                                                      I’d almost agree, though I can think of some cases where such code could exist for a reason (and I’ll bet that such code exists in real code bases). In particular, hairy macro expansions etc which produce code that isn’t even executed (or won’t be executed in the case where it would be UB, at least) in order to make compile-time type-safety checks. IIRC there are a few such things used in the Linux kernel. There are probably plenty of other cases; there’s a lot of C code out there.

                                                                                                                      In practice though, a lot of code that potentially exhibits UB only does so if certain constraints are violated (eg if a pointer is invalid, or if an integer is too large and will result in overflow at some operation), and the compiler can’t always tell that the constraints necessarily will be violated, so it generates code with the assumption that if the code is executed, then the constraints do hold. So if the larger body of code is wrong - the constraints are violated, that is - the behaviour is undefined.

                                                                                                                      1. 1

                                                                                                                        In particular, hairy macro expansions etc which produce code that isn’t even executed (or won’t be executed in the case where it would be UB

                                                                                                                        That’s why it’s good to have a proper macro system that isn’t literally just find and replace.

                                                                                                                        In practice though, a lot of code that potentially exhibits UB only does so if certain constraints are violated

                                                                                                                        True, and I’m mostly talking about UB that can be detected at compile time, such as f(++x, ++x).

                                                                                                            2. 6

                                                                                                              Contrary to what people are saying, C is just fine for what it is.

                                                                                                              People complain about the std library being tiny, but you basically have the operating system at your fingers, where C is a first class citizen.

                                                                                                              Then people complain C is not safe, yes that’s true, but with a set of best practices you can keep thing under control.

                                                                                                              People complain you don’t have generics, you dont need them most of the time.

                                                                                                              Projects like nginx, SQLite and redis, not to speak about the Nix world prove that C is perfectly fine of a language. Also most of the popular python libraries nowadays are written in C.

                                                                                                              1. 25

                                                                                                                Hi! I’d like to introduce you to Fish in a Barrel, a bot which publishes information about security vulnerabilities to Twitter, including statistics on how many of those vulnerabilities are due to memory unsafety. In general, memory unsafety is easy to avoid in languages which do not permit memory-unsafe operations, and nearly impossible to avoid in other languages. Because C is in the latter set, C is a regular and reliable source of security vulnerabilities.

                                                                                                                I understand your position; you believe that people are morally obligated to choose “a set of best practices” which limits usage of languages like C to supposedly-safe subsets. However, there are not many interesting subsets of C; at best, avoiding pointer arithmetic and casts is good, but little can be done about the inherent dangers of malloc() and free() (and free() and free() and …) Moreover, why not consider the act of choosing a language to be a practice? Then the choice of C can itself be critiqued as contrary to best practices.

                                                                                                                nginx is well-written, but Redis is not. SQLite is not written just in C, but also in several other languages combined, including SQL and TH1 (“test harness one”); this latter language is specifically for testing that SQLite behaves property. All three have had memory-unsafety bugs. This suggests that even well-written C, or C in combination with other languages, is unsafe.

                                                                                                                Additionally, Nix is written in C++ and package definitions are written in shell. I prefer PyPy to CPython; both are written in a combination of C and Python, with CPython using more C and PyPy using more Python. I’m not sure where you were headed here; this sounds like a popularity-contest argument, but those are not meaningful in discussions about technical issues. Nonetheless, if it’s the only thing that motivates you, then consider this quote from the Google Chrome security team:

                                                                                                                Since “memory safety” bugs account for 70% of the exploitable security bugs, we aim to write new parts of Chrome in memory-safe languages.

                                                                                                                1. 3

                                                                                                                  I am curious about your claim that Redis is not well-written? I’ve seen other folks online hold it up as an example of a well-written C codebase, at least in terms of readability.

                                                                                                                  I understand that readable is not the same as secure, but would like to understand where you are coming from on this.l

                                                                                                                  1. 1

                                                                                                                    It’s 100% personal opinion.

                                                                                                                2. 9

                                                                                                                  Projects like nginx, SQLite and redis, not to speak about the Nix world prove that C is perfectly fine of a language.

                                                                                                                  Ah yes, you can see the safety of high-quality C in practice:

                                                                                                                  https://nginx.org/en/security_advisories.html https://www.cvedetails.com/vulnerability-list/vendor_id-18560/product_id-47087/Redislabs-Redis.html

                                                                                                                  Including some fun RCEs, like CVE-2014-0133 or CVE-2016-8339.

                                                                                                                  1. 2

                                                                                                                    I also believe C will still have a place for long time. I know I’m a newbie with it, but making a game with C (using Raylib) has been pretty fun. It’s simple and to the point… And I don’t mind making mistakes really, that’s how I learn the best.

                                                                                                                    But again it’s cool to see people creating new languages as alternatives.

                                                                                                                  2. 4

                                                                                                                    What does Hare offer over C?

                                                                                                                    Here’s a list of ways that Drew says Hare improves over C:

                                                                                                                    Hare makes a number of conservative improvements on C’s ideas, the biggest bet of which is the use of tagged unions. Here are a few other improvements:

                                                                                                                    • A context-free grammar
                                                                                                                    • Less weird type syntax
                                                                                                                    • Language tooling in the stdlib
                                                                                                                    • Built-in and semantically meaningful static and runtime assertions
                                                                                                                    • A lightweight system for dependency resolution
                                                                                                                    • defer for cleanup and error handling
                                                                                                                    • An optional build system which you can replace with make and standard tools

                                                                                                                    Even with these improvements, Hare manages to be a smaller, more conservative language than C, with our specification clocking in at less than 1/10th the size of C11, without sacrificing anything that you need to get things done in the systems programming world.

                                                                                                                    It’s worth reading the whole piece. I only pasted his summary.